Zeebe is a cloud-native workflow engine for microservices orchestration

  • Define workflows graphically in BPMN 2.0
  • Choose any gRPC-supported programming language
  • Deploy with Docker and Kubernetes (in the cloud or on-premises)
  • Build workflows that react to events from Apache Kafka and other messaging platforms
  • Scale horizontally to handle very high throughput
  • Fault tolerance (no relational database required)
  • Export workflow data for monitoring and analysis
  • Engage with an active community

First Steps

Introduction to Zeebe

This section will help you understand what Zeebe is, how to get started with Zeebe, and how to get in touch with the community and the maintainers to ask questions.

  • What is Zeebe?: This writeup is the best place to start if you're brand new to Zeebe and microservices orchestration.

  • Install: Download a Zeebe distribution or use Docker to run Zeebe.

  • Quickstart: The Quickstart demonstrates the main concepts of Zeebe using only the command line client (no code writing necessary). After the Quickstart:

    • As a Java user, you should have look at the Java client library. The Get Started Java tutorial guides you through the first steps.

    • As a Go user, you should look into the Go client library. The Get Started Go tutorial guides you through the first steps.

  • Community Contributions: Find a list of contributions from the Zeebe community and learn how to contribute.

  • Get Help & Get Involved: Ask a question via one of Zeebe's public support channels or report an issue.

What is Zeebe?

Zeebe is a workflow engine for microservices orchestration. Zeebe ensures that, once started, flows are always carried out fully, retrying steps in case of failures. Along the way, Zeebe maintains a complete audit log so that the progress of flows can be monitored. Zeebe is fault tolerant and scales seamlessly to handle growing transaction volumes.

Below, we'll provide a brief overview of Zeebe. For more detail, we recommend the "What is Zeebe?" writeup on the main Zeebe site.

What problem does Zeebe solve, and how?

A company’s end-to-end workflows almost always span more than one microservice. In an e-commerce company, for example, a “customer order” workflow might involve a payments microservice, an inventory microservice, a shipping microservice, and more:

order-process

These cross-microservice workflows are mission critical, yet the workflows themselves are rarely modeled and monitored. Often, the flow of events through different microservices is expressed only implicitly in code.

If that’s the case, how can we ensure visibility of workflows and provide status and error monitoring? How do we guarantee that flows always complete, even if single microservices fail?

Zeebe gives you:

  1. Visibility into the state of a company’s end-to-end workflows, including the number of in-flight workflows, average workflow duration, current errors within a workflow, and more.
  2. Workflow orchestration based on the current state of a workflow; Zeebe publishes “commands” as events that can be consumed by one or more microservices, ensuring that workflows progress according to their definition.
  3. Monitoring for timeouts or other workflow errors with the ability to configure error-handling paths such as stateful retries or escalation to teams that can resolve an issue manually.

Zeebe was designed to operate at very large scale, and to achieve this, it provides:

  • Horizontal scalability and no dependence on an external database; Zeebe writes data directly to the filesystem on the same servers where it’s deployed. Zeebe makes it simple to distribute processing across a cluster of machines to deliver high throughput.
  • Fault tolerance via an easy-to-configure replication mechanism, ensuring that Zeebe can recover from machine or software failure with no data loss and minimal downtime. This ensures that the system as a whole remains available without requiring manual action.
  • A message-driven architecture where all workflow-relevant events are written to an append-only log, providing an audit trail and a history of the state of a workflow.
  • A publish-subscribe interaction model, which enables microservices that connect to Zeebe to maintain a high degree of control and autonomy, including control over processing rates. These properties make Zeebe resilient, scalable, and reactive.
  • Visual workflows modeled in ISO-standard BPMN 2.0 so that technical and non-technical stakeholders can collaborate on workflow design in a widely-used modeling language.
  • A language-agnostic client model, making it possible to build a Zeebe client in just about any programming language that an organization uses to build microservices.
  • Operational ease-of-use as a self-contained and self-sufficient system. Zeebe does not require a cluster coordinator such as ZooKeeper. Because all nodes in a Zeebe cluster are equal, it's relatively easy to scale, and it plays nicely with modern resource managers and container orchestrators such as Docker, Kubernetes, and DC/OS. Zeebe's CLI (Command Line Interface) allows you to script and automate management and operations tasks.

You can learn more about these technical concepts in the "Basics" section of the documentation.

Zeebe is simple and lightweight

Most existing workflow engines offer more features than Zeebe. While having access to lots of features is generally a good thing, it can come at a cost of increased complexity and degraded performance.

Zeebe is 100% focused on providing a compact, robust, and scalable solution for orchestration of workflows. Rather than supporting a broad spectrum of features, its goal is to excel within this scope.

In addition, Zeebe works well with other systems. For example, Zeebe provides a simple event stream API that makes it easy to stream all internal data into another system such as Elastic Search for indexing and querying.

Deciding if Zeebe is right for you

Note that Zeebe is currently in "developer preview", meaning that it's not yet ready for production and is under heavy development. See the roadmap for more details.

Your applications might not need the scalability and performance features provided by Zeebe. Or, you might a mature set of features around BPM (Business Process Management), which Zeebe does not yet offer. In such scenarios, a workflow automation platform such as Camunda BPM could be a better fit.

Install

This page guides you through the initial installation of the Zeebe broker and Zeebe Modeler for development purposes.

If you're looking for more detailed information on how to set up and operate Zeebe, make sure to check out the Operations Guide as well.

There are two different ways to install the Zeebe broker:

You'll likely also need the Zeebe Modeler for your project:

Using Docker

The easiest way to try Zeebe is using Docker. Using Docker provides you with a consistent environment, and we recommend it for development.

Prerequisites

  • Operating System:
    • Linux
    • Windows/MacOS (development only, not supported for production)
  • Docker

Docker configurations for docker-compose

The absolutely easiest way to try Zeebe is using the official docker-compose repository. This allows you to start complex configurations with a single command, and understand the details of how they are configured when you are ready to delve to that level.

Docker configurations for starting a single Zeebe broker using docker-compose, optionally with Operate and Simple Monitor, are available in the zeebe-docker-compose repository. Further instructions for using these configurations are in the README.md in that repository.

Using Docker without docker-compose

You can run Zeebe with Docker:

docker run --name zeebe -p 26500-26502:26500-26502 camunda/zeebe:latest

Exposed Ports

  • 26500: Gateway API
  • 26501: Command API (gateway-to-broker)
  • 26502: Internal API (broker-to-broker)

Volumes

The default data volume is under /usr/local/zeebe/data. It contains all data which should be persisted.

Configuration

The Zeebe configuration is located at /usr/local/zeebe/config/application.yaml. The logging configuration is located at /usr/local/zeebe/config/log4j2.xml.

The configuration of the docker image can also be changed by using environment variables. The configuration template files also contains information on the environment variables to use for each configuration setting.

Available environment variables:

  • ZEEBE_LOG_LEVEL: Sets the log level of the Zeebe Logger (default: info).
  • ZEEBE_BROKER_NETWORK_HOST: Sets the host address to bind to instead of the IP of the container.
  • ZEEBE_BROKER_CLUSTER_INITIALCONTACTPOINTS: Sets the contact points of other brokers in a cluster setup.

Mac and Windows users

Note: On systems which use a VM to run Docker containers like Mac and Windows, the VM needs at least 4GB of memory, otherwise Zeebe might fail to start with an error similar to:

Exception in thread "actor-runner-service-container" java.lang.OutOfMemoryError: Direct buffer memory
        at java.nio.Bits.reserveMemory(Bits.java:694)
        at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
        at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
        at io.zeebe.util.allocation.DirectBufferAllocator.allocate(DirectBufferAllocator.java:28)
        at io.zeebe.util.allocation.BufferAllocators.allocateDirect(BufferAllocators.java:26)
        at io.zeebe.dispatcher.DispatcherBuilder.initAllocatedBuffer(DispatcherBuilder.java:266)
        at io.zeebe.dispatcher.DispatcherBuilder.build(DispatcherBuilder.java:198)
        at io.zeebe.broker.services.DispatcherService.start(DispatcherService.java:61)
        at io.zeebe.servicecontainer.impl.ServiceController$InvokeStartState.doWork(ServiceController.java:269)
        at io.zeebe.servicecontainer.impl.ServiceController.doWork(ServiceController.java:138)
        at io.zeebe.servicecontainer.impl.ServiceContainerImpl.doWork(ServiceContainerImpl.java:110)
        at io.zeebe.util.actor.ActorRunner.tryRunActor(ActorRunner.java:165)
        at io.zeebe.util.actor.ActorRunner.runActor(ActorRunner.java:145)
        at io.zeebe.util.actor.ActorRunner.doWork(ActorRunner.java:114)
        at io.zeebe.util.actor.ActorRunner.run(ActorRunner.java:71)
        at java.lang.Thread.run(Thread.java:748)

If you are using Docker with the default Moby VM, you can adjust the amount of memory available to the VM through the Docker preferences. Right-click on the Docker icon in the System Tray to access preferences.

If you use a Docker setup with docker-machine and your default VM does not have 4GB of memory, you can create a new one with the following command:

docker-machine create --driver virtualbox --virtualbox-memory 4000 zeebe

Verify that the Docker Machine is running correctly:

docker-machine ls
NAME        ACTIVE   DRIVER       STATE     URL                         SWARM   DOCKER        ERRORS
zeebe     *        virtualbox   Running   tcp://192.168.99.100:2376           v17.03.1-ce

Configure your terminal:

eval $(docker-machine env zeebe)

Then run Zeebe:

docker run --rm -p 26500:26500 camunda/zeebe:latest

To get the ip of Zeebe:

docker-machine ip zeebe
192.168.99.100

Verify that you can connect to Zeebe:

telnet 192.168.99.100 26500

Download a distribution

You can always download the latest Zeebe release from the Github release page.

Prerequisites

  • Operating System:
    • Linux
    • Windows/MacOS (development only, not supported for production)
  • Java Virtual Machine:
    • Oracle Hotspot 11
    • Open JDK 11

Once you have downloaded a distribution, extract it into a folder of your choice. To extract the Zeebe distribution and start the broker, Linux users can type:

tar -xzf zeebe-distribution-X.Y.Z.tar.gz -C zeebe/
./bin/broker

Windows users can download the .zippackage and extract it using their favorite unzip tool. They can then open the extracted folder, navigate to the bin folder and start the broker by double-clicking on the broker.bat file.

Once the Zeebe broker has started, it should produce the following output:

bash
23:39:13.246 [] [main] INFO  io.zeebe.broker.system - Scheduler configuration: Threads{cpu-bound: 2, io-bound: 2}.
23:39:13.270 [] [main] INFO  io.zeebe.broker.system - Version: X.Y.Z
23:39:13.273 [] [main] INFO  io.zeebe.broker.system - Starting broker with configuration {

Install the Zeebe Modeler

The Zeebe Modeler is an open-source desktop BPMN modeling application created specifically for Zeebe.

You can download the most recent Zeebe Modeler release here.

Quickstart

This tutorial should help you to get to know the main concepts of Zeebe without the need to write a single line of code.

  1. Download the Zeebe distribution
  2. Start the Zeebe broker
  3. Deploy a workflow
  4. Create a workflow instance
  5. Complete a workflow instance
  6. Next steps

Note: Some command examples might not work on Windows if you use cmd or Powershell. For Windows users we recommend to use a bash-like shell, i.e. Git Bash, Cygwin or MinGW for this guide.

Step 1: Download the Zeebe distribution

You can download the latest distribution from the Zeebe release page.

Extract the archive and enter the Zeebe directory.

tar -xzvf zeebe-distribution-X.Y.Z.tar.gz
cd zeebe-broker-X.Y.Z/

Inside the Zeebe directory you will find multiple directories.

tree -d
.
├── bin     - Binaries and start scripts of the distribution
├── conf    - Zeebe and logging configuration
└── lib     - Shared java libraries

Step 2: Start the Zeebe broker

To start a Zeebe broker use the broker or broker.bat file located in the bin/ folder.

./bin/broker
23:39:13.246 [] [main] INFO  io.zeebe.broker.system - Scheduler configuration: Threads{cpu-bound: 2, io-bound: 2}.
23:39:13.270 [] [main] INFO  io.zeebe.broker.system - Version: X.Y.Z
23:39:13.273 [] [main] INFO  io.zeebe.broker.system - Starting broker with configuration {

You will see some output which contains the version of the broker and configuration parameters like directory locations and API socket addresses.

To continue this guide open another terminal to execute commands using the Zeebe CLI zbctl.

We can now check the status of the Zeebe broker.

Note: By default, the embedded gateway listens to a plaintext connection but the clients are configured to use TLS. Therefore, all zbctl commands in the quickstart will specify the --insecure flag.

./bin/zbctl --insecure status
Cluster size: 1
Partitions count: 1
Replication factor: 1
Brokers:
  Broker 0 - 0.0.0.0:26501
    Partition 1 : Leader

Step 3: Deploy a workflow

A workflow is used to orchestrate loosely coupled job workers and the flow of data between them.

In this guide we will use an example process order-process.bpmn. You can download it with the following link: order-process.bpmn.

order-process

The process describes a sequential flow of three tasks Collect Money, Fetch Items and Ship Parcel. If you open the order-process.bpmn file in a text editor you will see that every task has an attribute type defined in the XML which is later used as job type.

<!-- [...] -->
<bpmn:serviceTask id="collect-money" name="Collect Money">
  <bpmn:extensionElements>
    <zeebe:taskDefinition type="payment-service" />
  </bpmn:extensionElements>
</bpmn:serviceTask>
<!-- [...] -->
<bpmn:serviceTask id="fetch-items" name="Fetch Items">
  <bpmn:extensionElements>
    <zeebe:taskDefinition type="inventory-service" />
  </bpmn:extensionElements>
</bpmn:serviceTask>
<!-- [...] -->
<bpmn:serviceTask id="ship-parcel" name="Ship Parcel">
  <bpmn:extensionElements>
    <zeebe:taskDefinition type="shipment-service" />
  </bpmn:extensionElements>
</bpmn:serviceTask>
<!-- [...] -->

To complete an instance of this workflow we would need to activate and complete one job for each of the types payment-service, inventory-service and shipment-service.

But first let's deploy the workflow to the Zeebe broker.

./bin/zbctl --insecure deploy order-process.bpmn
{
  "key": 2251799813685250,
  "workflows": [
    {
      "bpmnProcessId": "order-process",
      "version": 1,
      "workflowKey": 2251799813685249,
      "resourceName": "order-process.bpmn"
    }
  ]
}

Step 4: Create a workflow instance

After the workflow is deployed we can create new instances of it. Every instance of a workflow is a single execution of the workflow. To create a new instance we have to specify the process ID from the BPMN file, in our case the ID is order-process as defined in the order-process.bpmn:

<bpmn:process id="order-process" isExecutable="true">

Every instance of a workflow normally processes some kind of data. We can specify the initial data of the instance as variables when we start the instance.

Note: Windows users who want to execute this command using cmd or Powershell have to escape the variables differently.

  • cmd: "{\"orderId\": 1234}"
  • Powershell: '{"\"orderId"\": 1234}'
./bin/zbctl --insecure create instance order-process --variables '{"orderId": 1234}'
{
  "workflowKey": 2251799813685249,
  "bpmnProcessId": "order-process",
  "version": 1,
  "workflowInstanceKey": 2251799813685251
}

Step 5: Complete a workflow instance

To complete the instance all three tasks have to be executed. In Zeebe a job is created for every task which is reached during workflow instance execution. In order to finish a job and thereby the corresponding task it has to be activated and completed by a job worker. A job worker is a long living process which repeatedly tries to activate jobs for a given job type and completes them after executing its business logic. The zbctl also provides a command to spawn simple job workers using an external command or script. The job worker will receive for every job the workflow instance variables as JSON object on stdin and has to return its result also as JSON object on stdout if it handled the job successfully.

In this example we use the unix command cat which just outputs what it receives on stdin. To complete a workflow instance we now have to create a job worker for each of the three task types from the workflow definition: payment-service, inventory-service and shipment-service.

Note: For Windows users this command does not work with cmd as the cat command does not exist. We recommend to use Powershell or a bash-like shell to execute this command.

./bin/zbctl --insecure create worker payment-service --handler cat &
./bin/zbctl --insecure create worker inventory-service --handler cat &
./bin/zbctl --insecure create worker shipment-service --handler cat &
2019/06/06 20:54:36 Handler completed job 2251799813685257 with variables
{"orderId":1234}
2019/06/06 20:54:36 Activated job 2251799813685264 with variables
{"orderId":1234}
2019/06/06 20:54:36 Handler completed job 2251799813685264 with variables
{"orderId":1234}
2019/06/06 20:54:36 Activated job 2251799813685271 with variables
{"orderId":1234}
2019/06/06 20:54:36 Handler completed job 2251799813685271 with variables
{"orderId":1234}

After the job workers are running in the background we can create more instances of our workflow to observe how the workers will complete them.

./bin/zbctl --insecure create instance order-process --variables '{"orderId": 12345}'

To close all job workers use the kill command to stop the background processes.

kill %1 %2 %3

If you want to visualize the state of the workflow instances you can start the Zeebe simple monitor.

Next steps

To continue working with Zeebe we recommend to get more familiar with the basic concepts of Zeebe, see the Basics chapter of the documentation.

In the BPMN Workflows chapter you can find an introduction to creating Workflows with BPMN.

The documentation also provides getting started guides for implementing job workers using Java or Go.

Community Contributions

Zeebe welcomes extensions and contributions from the community.

We use Awesome Zeebe as a place to keep track of Zeebe ecosystem contributions, such as...

  • Clients
  • Workers
  • Exporters
  • Applications

...along with other integrations such as Spring-Zeebe and the Apache Kafka connector.

If you built something for the Zeebe ecosystem, we encourage you to add it to Awesome Zeebe via pull request.

If you're interested in contributing to the main Zeebe repository (vs. creating an extension that lives in its own repository), be sure to start with the "Contributing to Zeebe" guide in GitHub

If you have questions about contributing, please let us know.

Go to Awesome Zeebe to see community extensions.

Get Help and Get Involved

We provide a few different public-facing Zeebe support and feedback channels so that users can ask questions, report problems, and make contributions.

Zeebe User Forum

The best place to ask questions about Zeebe and to troubleshoot issues is the Zeebe user forum.

The Zeebe team monitors the forum closely, and we do our best to respond to all questions in a timely manner.

Go to the Zeebe user forum

Public Slack Group

There's a public Zeebe Slack group where you can ask one-off questions, share community contributions, and connect with other Zeebe users.

Join the Zeebe Slack group

Create An Issue in GitHub

Did you find a problem in Zeebe? Or do you have a suggestion for an improvement?

You can create an issue in the Zeebe GitHub project to let us know.

Go to Issues in the Zeebe GitHub repository

Community Contributions

We cover community contributions in a dedicated section of the docs.

Read the Zeebe docs entry about community contributions

Release Cycle and Enterprise Support

Release Cycle

The Zeebe project follows the semantic version schema, which defines a version number using the MAJOR.MINOR.PATCH pattern.

  • MAJOR version can make incompatible API changes
  • MINOR version can add functionality in a backwards compatible manner
  • PATCH version can make backwards compatible bug fixes.

The Zeebe team strives to release:

  • A new minor version of Zeebe every three months
  • In between minor versions, two alpha releases (to preview the upcoming minor version)

At the time of writing, Zeebe supports the last two released minor versions with patch releases. Patch releases are offered on a best effort basis for the currently supported versions.

Breaking Changes Before Zeebe 1.0

Given how early we are in the Zeebe journey, we're not ready to make it a 1.0 release, and we want it to be transparent that this was a deliberate decision. If we do end up having to make breaking API changes, we'd rather do so before we get to 1.0 and not by moving to 2.0 just a few months after a 1.0 release.

Of course, even in Zeebe's pre-1.0 state, we'll always make our best effort to avoid breaking changes, to communicate early and often about planned changes, and if possible, to provide a migration path if we do need to make such a change.

Supported Environments

Zeebe

  • Zeebe Broker/Gateway the cluster components of Zeebe require OpenJDK 11+ and optional if the Elasticsearch exporter is used Elasticsearch 6.8.x
  • Zeebe Java Client the Java client for Zeebe requires OpenJDK 8+
  • Zeebe Go Client the Go client for Zeebe requires Go 1.13+
  • zbctl the Zeebe CLI supports latest versions of Windows, MacOS and Linux

Camunda Operate

  • Operate Web App/Importer/Archiver the server components of Camunda Operate require OpenJDK 11+ and Elasticsearch 6.8.x
  • Operate Browser App requires the latest version of Chrome, Firefox or Edge on Windows, MacOS and Linux

Camunda Cloud

Zeebe is built according to cloud-native principles, and we want Zeebe to be the workflow engine for important, emerging use cases running on modern software architectures.

But even with a best-in-class architecture, operating a distributed workflow engine 24x7 can be challenging and time consuming. We've heard from a number of users who would be happy to have us run Zeebe and Operate on their behalf.

With that in mind, we have a dedicated team–additional to the Zeebe core engineering team building the workflow engine–currently working the first iteration of Camunda Cloud, where we'll offer Zeebe and Operate as a cloud service. This will be the first-ever cloud workflow service offered by Camunda, and we're really excited for what's ahead.

If you'd like to be notified when we open up a beta program for Camunda Cloud, you can sign up here.

Zeebe Basics

This section provides an overview of Zeebe's core concepts. Understanding them helps to successfully build workflow applications.

Architecture

There are four main components in Zeebe's architecture: the client, the gateway, the broker, and the exporter.

zeebe-architecture

Client

Clients are libraries that you embed in an application (e.g. a microservice that executes your business logic) to connect to a Zeebe cluster. Clients have two primary uses:

  • Carrying out business logic (starting workflow instances, publishing messages, working on tasks)
  • Handling operational issues (updating workflow instance variables, resolving incidents)

More about Zeebe clients:

  • Clients connect to the Zeebe gateway via gRPC, which uses http/2-based transport. To learn more about gRPC in Zeebe, check out the gRPC section of the docs.
  • The Zeebe project includes officially-supported Java and Go clients, and gRPC makes it possible to generate clients in a range of different programming languages. Community clients have been created in other languages, including C#, Ruby, and JavaScript.
  • Client applications can be scaled up and down completely separately from Zeebe--the Zeebe brokers do not execute any business logic.

Gateway

The gateway, which proxies requests to brokers, serves as a single entry point to a Zeebe cluster.

The gateway is stateless and sessionless, and gateways can be added as necessary for load balancing and high availability.

Broker

The Zeebe broker is the distributed workflow engine that keeps state of active workflow instances.

Brokers can be partitioned for horizontal scalability and replicated for fault tolerance. A Zeebe deployment will often consist of more than one broker.

It's important to note that no application business logic lives in the broker. Its only responsibilities are:

  1. Storing and managing the state of active workflow instances

  2. Distributing work items to clients

Brokers form a peer-to-peer network in which there is no single point of failure. This is possible because all brokers perform the same kind of tasks and the responsibilities of an unavailable broker are transparently reassigned in the network.

Exporter

The exporter system provides an event stream of state changes within Zeebe. This data has many potential uses, including but not limited to:

  • Monitoring the current state of running workflow instances

  • Analysis of historic workflow data for auditing, business intelligence, etc

  • Tracking incidents created by Zeebe

The exporter includes a simple API that you can use to stream data into a storage system of your choice. Zeebe includes an out-of-the-box Elasticsearch exporter, and other community-contributed exporters are also available.

Workflows

Workflows are flowchart-like blueprints that define the orchestration of tasks. Every task represents a piece of business logic such that the ordered execution produces a meaningful result.

A job worker is your implementation of the business logic required to complete a task. A job worker must embed a Zeebe client library to communicate with the broker, but otherwise, there are no restrictions on its implementation. You can choose to write a worker as a microservice, but also as part of a classical three-tier application, as a (lambda) function, via command line tools, etc.

Running a workflow then requires two steps: submitting the workflow to Zeebe and creating job workers that can request jobs from Zeebe and complete them.

Sequences

The simplest kind of workflow is an ordered sequence of tasks. Whenever workflow execution reaches a task, Zeebe creates a job that can be requested and completed by a job worker.

workflow-sequence

You can think of Zeebe's workflow orchestration as a state machine. A workflow instance reaches a task, and Zeebe creates a job that can be requested by a worker. Zeebe then waits for the worker to request a job and complete the work. Once the work is completed, the flow continues to the next step. If the worker fails to complete the work, the workflow remains at the current step, and the job could be retried until it's successfully completed.

Data Flow

As Zeebe progresses from one task to the next in a workflow, it can move custom data in the form of variables. Variables are key-value-pairs and part of the workflow instance.

data-flow

Every job worker can read the variables and modify them when completing a job so that data can be shared between different tasks in a workflow.

Data-based Conditions

Some workflows do not always execute the same tasks but need to choose different tasks based on variables and conditions:

data-conditions

The diamond shape with the "X" in the middle is an element indicating that the workflow decides to take one of many paths.

Events

Events represent things that happen. A workflow can react to events (catching event) and can emit events (throwing event). For example:

workflow

There are different types of events such as message or timer.

Fork / Join Concurrency

In many cases, it is also useful to perform multiple tasks in parallel. This can be achieved with Fork / Join concurrency:

data-conditions

The diamond shape with the "+" marker means that all outgoing paths are activated and all incoming paths are merged.

BPMN 2.0

Zeebe uses BPMN 2.0 for representing workflows. BPMN is an industry standard which is widely supported by different vendors and implementations. Using BPMN ensures that workflows can be interchanged between Zeebe and other workflow systems.

YAML Workflows

In addition to BPMN 2.0, Zeebe supports a YAML workflow format. It can be used to quickly write simple workflows in text. Unlike BPMN, it has no visual representation and is not standardized. Zeebe transforms YAML to BPMN on submission.

BPMN Modeler

Zeebe provides a free and open-source BPMN modeling tool to create BPMN diagrams and configure their technical properties. The modeler is a desktop application based on the bpmn.io open source project.

Zeebe Modeler can be downloaded from GitHub.

Job Workers

A job worker is a component capable of performing a particular step in a workflow.

What is a Job?

A job is a work item in a workflow. For example:

  • Processing a payment
  • Generating a PDF document
  • Updating customer data in a backend system

A job has the following properties:

  • Type: Describes the work item and is defined in each task in the workflow. The type is referenced by workers to request the jobs they are able to perform.
  • Variables: The contextual/business data of the workflow instance that is required by the worker to do its work.
  • Custom Headers: Additional static metadata defined in the workflow. Mostly used to configure a worker which is used for more than one workflow step.

Requesting Jobs from the Broker

Job workers request jobs of a certain type from the broker on a regular interval (i.e. polling). This interval and the number of jobs requested are configurable in the Zeebe client.

If one or more jobs of the requested type are available, the broker will stream activated jobs to the worker. Upon receiving jobs, a worker performs them and sends back a complete or fail message for each job depending on whether the job could be completed successfully or not.

For example, the following workflow might generate three different types of jobs: process-payment, fetch-items, and ship-parcel:

order-workflow-model

Three different job workers, one for each job type, could request jobs from Zeebe:

zeebe-job-workers-requesting-jobs

Many workers can request the same job type in order to scale up processing. In this scenario, the broker ensures that each job is sent to only one of the workers. Such a job is considered activated until the job is completed, failed or the job activation times out.

On requesting jobs, the following properties can be set:

  • Worker: The identifier of the worker. Used for auditing purposes.
  • Timeout: The time a job is assigned to the worker. If a job is not completed within this time then it can be requested again from a worker.
  • MaxJobsToActivate: The maximum number of jobs which should be activated by this request.
  • FetchVariables: A list of variables names which are required. If the list is empty, all variables of the workflow instance are requested.

Long polling

When there are no jobs available, a request for jobs can be completed immediately. To find a job to work on, the worker now needs to poll again for available jobs. This leads to the situation that the workers repeatedly send the requests until a job is available. This is expensive in terms of resource usage, because both the client and the server are performing a lot of unproductive work. To better utilize the resources, Zeebe can employ long polling for available jobs.

With long polling enabled, a request will be kept open when there are no jobs available. The request is completed when at least one job is available or after a specified duration. A worker can also specify a RequestTimeout as the duration of keeping the request open. The default request timeout is 10 seconds, but it is also configurable in the client. Long polling for available jobs can be disabled using the configuration flag: gateway.longPolling.enabled or the environment variable ZEEBE_GATEWAY_LONGPOLLING_ENABLED. It is enabled by default.

Job Queueing

Zeebe decouples creation of jobs from performing the work on them. It is always possible to create jobs at the highest possible rate, regardless of whether or not there's a worker available to work on them. This is possible because Zeebe queues jobs until workers request them. If no job worker is currently requesting jobs, jobs remain queued. Because workers request jobs from the broker, the workers have control over the rate at which they take on new jobs.

This allows the broker to handle bursts of traffic and effectively act as a buffer in front of the job workers.

Completing and Failing Jobs

After working on an activated job, a job worker can inform the broker that the job has either been completed or failed. If the job worker could successfully complete its work, it can inform the broker of this success by sending a complete job command. If the job could not be completed within the configured job activation timeout, then the broker will make the job available again to other job workers.

In order to expose the results of the job, the job worker can pass variables with the complete job command. These variables will be merged into the workflow instance depending on the output variable mapping. Note that this may overwrite existing variables and can lead to race conditions in parallel flows. We recommend completing jobs with only those variables that need to be changed.

If the job worker could not successfully complete its work, it can inform the broker of this failure by sending a fail job command. Fail job commands include the number of remaining retries. If this is a positive number then the job will be immediately activatable again, and a worker could try to process it again. If it is zero or negative however, an incident will be raised and the job will not be activatable until the incident is resolved.

Partitions

Note: If you have worked with the Apache Kafka System before, the concepts presented on this page will sound very familiar to you.

In Zeebe, all data is organized into partitions. A partition is a persistent stream of workflow-related events. In a cluster of brokers, partitions are distributed among the nodes so it can be thought of as a shard. When you bootstrap a Zeebe broker you can configure how many partitions you need.

Usage Examples

Whenever you deploy a workflow, you deploy it to the first partition. The workflow is then distributed to all partitions. On all partitions, this workflow receives the same key and version such that it can be consistently identified.

When you start an instance of a workflow, the client library will then route the request to one partition in which the workflow instance will be published. All subsequent processing of the workflow instance will happen in that partition.

Scalability

Use partitions to scale your workflow processing. Partitions are dynamically distributed in a Zeebe cluster and for each partition there is one leading broker at a time. This leader accepts requests and performs event processing for the partition. Let us assume you want to distribute workflow processing load over five machines. You can achieve that by bootstraping five partitions.

Note that while each partition has one leading broker, not all brokers are guaranteed to be leading a partition. A broker can lead more than one partition, and, at times, a broker in a cluster may be acting only as a replication back-up for partitions. This broker will not be doing any active work on processes until a partition fail-over happens.

Partition Data Layout

A partition is a persistent append-only event stream. Initially, a partition is empty. As the first entry gets inserted, it takes the place of the first entry. As the second entry comes in and is inserted, it takes the place as the second entry and so on and so forth. Each entry has a position in the partition which uniquely identifies it.

partition

Replication

For fault tolerance, data in a partition is replicated from the leader of the partition to its followers. Followers are other Zeebe Broker nodes that maintain a copy of the partition without performing event processing.

Recommendations

Choosing the number of partitions depends on the use case, workload and cluster setup. Here are some rules of thumb:

  • For testing and early development, start with a single partition. Note that Zeebe's workflow processing is highly optimized for efficiency, so a single partition can already handle high event loads.
  • With a single Zeebe Broker, a single partition is mostly enough. However, if the node has many cores and the broker is configured to use them, then more partitions can increase the total throughput (~ 2 threads per partition).
  • Base your decisions on data. Simulate the expected workload, measure and compare the performance of different partition setups.

Protocols

Zeebe clients connect to brokers via a stateless gateway. For the communication between client and gateway gRPC is used. The communication protocol is defined using Protocol Buffers v3 (proto3), and you can find it in the Zeebe repository.

What is gRPC?

gRPC was first developed by Google and is now an open-source project and part of the Cloud Native Computing Foundation. If you’re new to gRPC, the “What is gRPC” page on the project website provides a good introduction to it.

Why gRPC?

gRPC has many nice features that make it a good fit for Zeebe. It:

  • supports bi-directional streaming for opening a persistent connection and sending or receiving a stream of messages between client and server
  • uses the common http2 protocol by default
  • uses Protocol Buffers as an interface definition and data serialization mechanism–specifically, Zeebe uses proto3, which supports easy client generation in ten different programming languages

Supported clients

At the moment, Zeebe officially supports two gRPC clients: one in Java, and one in Golang.

If Zeebe does not provide an officially-supported client in your target language, you can read the official Quick Start page to find out how to create a very basic one.

You can find a list of existing clients in the Awesome Zeebe repository. Additionally, a blog post was published with a short tutorial on how to write a new client from scratch in Python.

Handling back-pressure

When a broker receives a user request, it is written to the event stream first (see section Internal Processing for details), and processed later by the stream processor. If the processing is slow or if there are many user requests in the stream, it might take too long for the processor to start processing the command. If the broker keeps accepting new requests from the user, the back log increases and the processing latency can grow beyond an acceptable time. To avoid such problems, Zeebe employs a back-pressure mechanism. When the broker receives more requests than it can process with an acceptable latency, it rejects some requests.

The maximum rate of requests that can be processed by a broker depends on the processing capacity of the machine, the network latency, current load of the system and so on. Hence, there is no fixed limit configured in Zeebe for the maximum rate of requests it accepts. Instead, Zeebe uses an adaptive algorithm to dynamically determine the limit of the number of inflight requests (the requests that are accepted by the broker, but not yet processed). The inflight request count is incremented when a request is accepted and decremented when a response is sent back to the client. The broker rejects requests when the inflight request count reaches the limit.

When the broker rejects requests due to back-pressure, the clients can retry them with an appropriate retry strategy. If the rejection rate is high, it indicates that the broker is constantly under high load. In that case, it is recommended to reduce the request rate.

Internal Processing

Internally, Zeebe is implemented as a collection of stream processors working on record streams (partitions). The stream processing model is used since it is a unified approach to provide:

  • Command Protocol (Request-Response),
  • Record Export (Streaming),
  • Workflow Evaluation (Asynchronous Background Tasks)

Record export solves the history problem: The stream provides exactly the kind of exhaustive audit log that a workflow engine needs to produce.

State Machines

Zeebe manages stateful entities: Jobs, Workflows, etc. Internally, these entities are implemented as State Machines managed by a stream processor.

The concept of the state machine pattern is simple: An instance of a state machine is always in one of several logical states. From each state, a set of transitions defines the next possible states. Transitioning into a new state may produce outputs/side effects.

Let's look at the state machine for jobs. Simplified, it looks as follows:

partition

Every oval is a state. Every arrow is a state transition. Note how each state transition is only applicable in a specific state. For example, it is not possible to complete a job when it is in state CREATED.

Events and Commands

Every state change in a state machine is called an event. Zeebe publishes every event as a record on the stream.

State changes can be requested by submitting a command. A Zeebe broker receives commands from two sources:

  1. Clients send commands remotely. Examples: Deploying workflows, starting workflow instances, creating and completing jobs, etc.
  2. The broker itself generates commands. Examples: Locking a job for exclusive processing by a worker, etc.

Once received, a command is published as a record on the addressed stream.

Stateful Stream Processing

A stream processor reads the record stream sequentially and interprets the commands with respect to the addressed entity's lifecycle. More specifically, a stream processor repeatedly performs the following steps:

  1. Consume the next command from the stream.
  2. Determine whether the command is applicable based on the state lifecycle and the entity's current state.
  3. If the command is applicable: Apply it to the state machine. If the command was sent by a client, send a reply/response.
  4. If the command is not applicable: Reject it. If it was sent by a client, send an error reply/response.
  5. Publish an event reporting the entity's new state.

For example, processing the Create Job command produces the event Job Created.

Command Triggers

A state change which occurred in one entity can automatically trigger a command for another entity. Example: When a job is completed, the corresponding workflow instance shall continue with the next step. Thus, the Event Job Completed triggers the command Complete Activity.

Exporters

As Zeebe processes jobs and workflows, or performs internal maintenance (e.g. raft failover), it will generate an ordered stream of records:

record-stream

While the clients provide no way to inspect this stream directly, Zeebe can load and configure user code that can process each and every one of those records, in the form of an exporter.

An exporter provides a single entry point to process every record that is written on a stream.

With it, you can:

  • Persist historical data by pushing it to an external data warehouse
  • Export records to a visualization tool (e.g. zeebe-simple-monitor)

Zeebe will only load exporters which are configured through the main Zeebe YAML configuration file.

Once an exporter is configured, the next time Zeebe is started, the exporter will start receiving records. Note that it is only guaranteed to see records produced from that point on.

For more information, you can read the reference information page, and you can find a reference implementation in the form of the Zeebe-maintained ElasticSearch exporter.

Considerations

The main impact exporters have on a Zeebe cluster is that they remove the burden of persisting data indefinitely.

Once data is not needed by Zeebe itself anymore, it will query its exporters to know if it can be safely deleted, and if so, will permanently erase it, thereby reducing disk usage.

Note:, if no exporters are configured at all, then Zeebe will automatically erase data when it is not necessary anymore. If you need historical data, then you need to configure an exporter to stream records into your external data warehouse.

Clustering

Zeebe can operate as a cluster of brokers, forming a peer-to-peer network. In this network, all brokers have the same responsibilities and there is no single point of failure.

cluster

Gossip Membership Protocol

Zeebe implements the Gossip protocol to know which brokers are currently part of the cluster.

The cluster is bootstrapped using a set of well-known bootstrap brokers, to which the other ones can connect. To achieve this, each broker must have at least one bootstrap broker as its initial contact point in their configuration:

...
  cluster:
    initialContactPoints: [ node1.mycluster.loc:26502 ]

When a broker is connected to the cluster for the first time, it fetches the topology from the initial contact points and then starts gossiping with the other brokers. Brokers keep cluster topology locally across restarts.

Raft Consensus and Replication Protocol

To ensure fault tolerance, Zeebe replicates data across machines using the Raft protocol.

Data is divided into partitions (shards). Each partition has a number of replicas. Among the replica set, a leader is determined by the raft protocol which takes in requests and performs all the processing. All other brokers are passive followers. When the leader becomes unavailable, the followers transparently select a new leader.

Each broker in the cluster may be both leader and follower at the same time for different partitions. This way, client traffic is distributed evenly across all brokers.

cluster

Commit

Before a new record on a partition can be processed, it must be replicated to a quorum (typically majority) of followers. This procedure is called commit. Committing ensures that a record is durable even in case of complete data loss on an individual broker. The exact semantics of committing are defined by the raft protocol.

cluster

Zeebe Getting Started Tutorial

Welcome to the Zeebe Getting Started Tutorial.

We'll walk you through an end-to-end Zeebe example, including building and configuring a workflow model in Zeebe Modeler, deploying the model then creating and working on instances using the Zeebe Command Line Interface, and then seeing what's going on in a tool called Operate.

  1. Tutorial Setup
  2. Create a Workflow
  3. Deploy a Workflow
  4. Create and Complete Instances
  5. Next Steps and Resources

If you have questions about Zeebe, we encourage you to visit the Zeebe user forum.

Go To Tutorial Setup >>

Tutorial Setup

Welcome to the Getting Started tutorial for Zeebe and Operate. In this tutorial, we'll walk you through how to...

  • Model a workflow using Zeebe Modeler
  • Deploy the workflow to Zeebe
  • Create workflow instances
  • Use workers to complete jobs created by those workflow instances
  • Correlate messages to workflow instances
  • Monitor what's happening and get detail about running workflow instances in Operate

If this is your first time working with Zeebe, we expect this entire guide to take you 30-45 minutes to complete.

If you're looking for a very fast (but less comprehensive) "first contact" experience, you might prefer the Quickstart.

The tutorial assumes you have some basic knowledge of what Zeebe is and what it's used for. If you're completely new to Zeebe, you might find it helpful to read through the "What is Zeebe?" docs article first.

Below are the components you'll use in the tutorial. The easiest way to run them is to download the Zeebe Modeler and use the operate docker-compose profile in the zeebe-docker-compose repository. Further instructions for using Zeebe with Docker can be found in the README.md file in that repository.

You can also download the full distributions for these components, instead of running them with Docker.

  1. Zeebe Modeler: A desktop modeling tool that we'll use to create and configure our workflow before we deploy it to Zeebe.
  2. Zeebe Distribution: The Zeebe distribution contains the workflow engine where we'll deploy our workflow model; the engine is also responsible for managing the state of active workflow instances. Included in the distro is the Zeebe CLI, which we'll use throughout the tutorial. Please use Zeebe 0.20.0.
  3. Camunda Operate: An operations tool for monitoring and troubleshooting live workflow instances in Zeebe. Operate is currently available for free and unrestricted non-production use.
  4. Elasticsearch 6.8.0: An open-source distributed datastore that can connect to Zeebe to store workflow data for auditing, visualization, analysis, etc. Camunda Operate uses Elasticsearch as its underlying datastore, which is why you need to download Elasticsearch to complete this tutorial. Operate and Zeebe are compatible with Elasticsearch 6.8.0.

In case you're already familiar with BPMN and how to create a BPMN model in Zeebe Modeler, you can find the finished model that we create during the tutorial here: Zeebe Getting Started Tutorial Workflow Model.

If you're using the finished model we provide rather than building your own, you can also move ahead to section 3.3: Deploy a Workflow.

And if you have questions or feedback about the tutorial, we encourage you to visit the Zeebe user forum and ask a question.

There's a "Getting Started" category for topics that you can use when you ask your question or give feedback.

Next Page: Create a Workflow >>

Create a Workflow in Zeebe Modeler

New to BPMN and want to learn more before moving forward? This blog post helps to explain the standard and why it's a good fit for microservices orchestration.

In case you're already familiar with BPMN and how to create a BPMN model in Zeebe Modeler, you can find the finished model that we create during the tutorial here: Zeebe Getting Started Tutorial Workflow Model.

If you're using the finished model we provide rather than building your own, you can also move ahead to section 3.3: Deploy a Workflow.

Zeebe Modeler is a desktop modeling tool that allows you to build and configure workflow models using BPMN 2.0. In this section, we'll create a workflow model and get it ready to be deployed to Zeebe.

We'll create an e-commerce order process as our example, and we'll model a workflow that consists of:

  • Initiating a payment for an order
  • Receiving a payment confirmation message from an external system
  • Shipping the items in the order with or without insurance depending on order value

This is what your workflow model will look like when we're finished:

Getting Started Workflow Model

The payment task and shipping tasks are carried out by worker services that we'll connect to the workflow engine. The "Payment Received" message will be published to Zeebe by an external system, and Zeebe will then correlate the message to a workflow instance.

To get started

  • Open the Zeebe Modeler and create a new BPMN diagram.
  • Save the model as order-process.bpmn in the top level of the Zeebe broker directory that you just downloaded. As a reminder, this directory is called zeebe-broker-0.17.0

The first element in your model will be a Start Event, which should already be on the canvas when you open the Modeler.

It's a BPMN best practice to label all elements in our model, so:

  • Double-click on the Start Event
  • Label it "Order Placed" to signify that our process will be initiated whenever a customer places an order

Next, we need to add a Service Task:

  • Click on the Start Event and select Task icon
  • Label the newly created Task "Initiate Payment"
  • Click the wrench icon and change the Task to a Service Task

Next, we'll configure the "Initiate Payment" Service Task so that an external microservice can work on it:

  • Click on the "Initiate Payment" task
  • Expand the Properties panel on the right side of the screen if it's not already visible
  • In the Type field in the Properties panel, enter initiate-payment

This is what you should see in your Modeler now.

Initiate Payment Service Task

This Type field represents the job type in Zeebe. A couple of concepts that are important to understand at this point:

  • A job is simply a work item in a workflow that needs to be completed before a workflow instance can proceed to the next step. (See: Job Workers)
  • A workflow instance is one running instance of a workflow model--in our case, an individual order to be fulfilled. (See: Workflows)

For every workflow instance that arrives at the "Initiate Payment" Service Task, Zeebe will create a job with type initiate-payment. The external worker service responsible for payment processing--the so-called job worker--will poll Zeebe intermittently to ask if any jobs of type initiate-payment are available.

If a job is available for a given workflow instance, the worker will activate it, complete it, and notify Zeebe. Zeebe will then advance that workflow instance to the next step in the workflow.

Next, we'll add a Message Event to the workflow:

  • Click on the "Initiate Payment" task on the Modeler
  • Select the circular icon with a double line border
  • Click on the wrench icon next to the newly created event
  • Select the Message Intermediate Catch Event
  • Double-click on the message event and label it "Payment Received"

Message Event

We use message catch events in Zeebe when the workflow engine needs to receive a message from an external system before the workflow instance can advance. (See: Message Events)

In the scenario we're modeling, we initiate a payment with our Service Task, but we need to wait for some other external system to actually confirm that the payment was received. This confirmation comes in the form of a message that will be sent to Zeebe - asynchronously - by an external service.

Messages received by Zeebe need to be correlated to specific workflow instances. To make this possible, we have some more configuring to do:

  • Select the Message Event and make sure you're on the "General" tab of the Properties panel on the right side of the screen
  • In the Properties panel, click the + icon to create a new message. You'll now see two fields in the Modeler that we'll use to correlate a message to a specific workflow instance: Message Name and Subscription Correlation Key.
  • Let's give this message a self-explanatory name: payment-received.

Add Message Name

When Zeebe receives a message, this name field lets us know which message event in the workflow model the message is referring to.

But how do we know which specific workflow instance--that is, which customer order--a message refers to? That's where Subscription Correlation Key comes in. The Subscription Correlation Key is a unique ID present in both the workflow instance payload and the message sent to Zeebe.

We'll use orderId for our correlation key.

Go ahead and add the expression = orderId to the Subscription Correlation Key field.

When we create a workflow instance, we need to be sure to include orderId as a variable, and we also need to provide orderId as a correlation key when we send a message.

Here's what you should see in the Modeler:

Message Correlation Key

Next, we'll add an Exclusive (XOR) Gateway to our workflow model. The Exclusive Gateway is used to make a data-based decision about which path a workflow instance should follow. In this case, we want to ship items with insurance if total order value is greater than or equal to $100 and ship without insurance otherwise.

That means that when we create a workflow instance, we'll need to include order value as an instance variable. But we'll come to that later.

First, let's take the necessary steps to configure our workflow model to make this decision. To add the gateway:

  • Click on the Message Event you just created
  • Select the Gateway (diamond-shaped) symbol - the Exclusive Gateway is the default when you add a new gateway to a model
  • Double-click on the gateway and add a label "Order Value?" so that it's clear what we're using as our decision criteria

Add Exclusive Gateway to Model

Label Exclusive Gateway in Model

We'll add two outgoing Sequence Flows from this Exclusive Gateway that lead to two different Service Tasks. Each Sequence Flow will have a data-based condition that's evaluated in the context of the workflow instance payload.

Next, we need to:

  • Select the gateway and add a new Service Task to the model.
  • Label the task "Ship Without Insurance"
  • Set the Type to ship-without-insurance

Add No Insurance Service Task

Whenever we use an Exclusive Gateway, we want to be sure to set a default flow, which in this case will be shipping without insurance:

  • Select the Sequence Flow you just created from the gateway to the "Ship Without Insurance" Service Task
  • Click on the wrench icon
  • Choose "Default Flow"

Add No Insurance Service Task

Now we're ready to add a second outgoing Sequence Flow and Service Task from the gateway:

  • Select the gateway again
  • Add another Service Task to the model
  • Label it "Ship With Insurance"
  • Set the type to ship-with-insurance

Next, we'll set a condition expression in the Sequence Flow leading to this "Ship With Insurance" Service Task:

  • Click on the sequence flow and open the Properties panel
  • Input the expression = orderValue >= 100 in the "Condition expression" field in the Properties panel
  • Double-click on the sequence flow to add a label ">= $100"

Condition Expression

We're almost finished! To wrap things up, we'll:

  • Select the "Ship Without Insurance" task
  • Add another Exclusive Gateway to the model to merge the branches together again (a BPMN best practice in a model like this one).
  • Select the "Ship With Insurance" task
  • Add an outgoing sequence flow that connects to the second Exclusive Gateway you just created

The only BPMN element we need to add is an End Event:

  • Click on the second Exclusive Gateway
  • Add an End Event
  • Double-click on it to label it "Order Fulfilled"

Condition Expression

Lastly, we'll change the process ID to something more descriptive than the default Process_1 that you'll see in the Modeler:

  • Click onto a blank part of the canvas
  • Open the Properties panel
  • Change the Id to order-process

Here's what you should see in the Modeler after these last few updates:

Update Process ID

That's all for our modeling step. Remember to save the file one more time to prepare to deploy the workflow to Zeebe, create workflow instances, and complete them.

Next Page: Deploy a Workflow >>

<< Previous Page: Tutorial Setup

Deploy a Workflow to Zeebe

In this section, we're going to start up the Zeebe broker as well as Camunda Operate, a tool that gives you visibility into deployed workflows and running workflow instances and contains tooling for fixing problems in those workflow instances.

We offer Operate free of charge for unrestricted non-production use because we think it's a great tool for getting familiar with Zeebe and building initial proofs-of-concept. And at this time, Operate is available for non-production use only. In the future, we'll offer an Operate enterprise license that allows for production use, too.

Before we run the Zeebe broker, we need to configure an Elasticsearch exporter in the Zeebe configuration file. Which leads to the question: what's an exporter, and why is Elasticsearch a part of this tutorial?

The answer is that Zeebe itself doesn't store historic data related to your workflow instances. If you want to keep this data for auditing or for analysis, you need to export to another storage system. Zeebe does provide an easy-to-use exporter interface, and it also offers an Elasticsearch exporter out of the box. (See: Exporters)

Elasticsearch is also what Camunda Operate uses to store data, so to run Operate, you need to enable the Elasticsearch exporter in Zeebe and run an instance of Elasticsearch. In this section and the next section of the tutorial, we'll use Operate to visualize what's going on in Zeebe with each step we take.

If you are using Docker and zeebe-docker-compose then follow the instructions in the README file in the operate directory of that repository to start Zeebe and Operate. Once you have done that, skip the following section, and continue from "Check the status of the broker".

If you are using individual components, then you will need to manually configure and start components.

Manually configure and start Zeebe and Operate

These instructions are for using separate components, and are not necessary when using Docker.

First, copy the following lines into a new file getting-started.yaml file (in the config directory of the Zeebe broker).

zeebe:
  broker:
    exporters:
      elasticsearch:
        className: io.zeebe.exporter.ElasticsearchExporter

These settings enable the Zeebe Elasticsearch exporter.

Note: Some command examples might not work on Windows if you use cmd or Powershell. For Windows users we recommend to use a bash-like shell, i.e. Git Bash, Cygwin or MinGW for this guide.

Next, open Terminal or another command line tool and start up Elasticsearch.

cd elasticsearch-6.7.0

Linux / Mac

bin/elasticsearch

Windows

bin\elasticsearch.bat

You'll know that startup was successful when you see something like:

[2019-04-05T10:26:22,288][INFO ][o.e.n.Node ] [oy0juRR] started

Then start the Zeebe broker in another Terminal window.

./bin/broker --spring.config.location=file:./config/getting-started.yaml

And finally, start Operate in yet another Terminal window. Note that you'll need port 8080 in order to run Operate and access the UI, so be sure to check that it's available.

cd camunda-operate-distro-1.0.0-RC2
bin/operate

To confirm that Operate was started, go to http://localhost:8080. You should see the following:

Zeebe Configuration File

You can leave this tab open as we'll be returning to it shortly.

Check the status of the broker

You can use the Zeebe CLI to check the status of your broker. Open a new Terminal window to run it.

If you are using Docker, change into the zeebe-docker-compose directory.

If you are using separate components, then change into the Zeebe broker directory.

Run the following:

Linux

./bin/zbctl --insecure status

Mac

./bin/zbctl.darwin --insecure status

Windows

./bin/zbctl.exe --insecure status

You should see a response like this one:

Cluster size: 1
Partitions count: 1
Replication factor: 1
Brokers:
  Broker 0 - 0.0.0.0:26501
    Partition 0 : Leader

For all Zeebe-related operations moving forward, we'll be using Zeebe's command-line interface (CLI). In a real-world deployment, you likely wouldn't rely on the CLI to send messages or create job workers. Rather, you would embed Zeebe clients in worker microservices that connect to the Zeebe engine.

But for the sake of keeping this guide simple (and language agnostic), we're going to use the CLI.

Next, we'll deploy our workflow model via the CLI. We'll deploy the workflow model we created in the previous section.

Linux

./bin/zbctl --insecure deploy order-process.bpmn

Mac

./bin/zbctl.darwin --insecure deploy order-process.bpmn

Windows

./bin/zbctl.exe --insecure deploy order-process.bpmn

You should see a response like this one:

{
  "key": 2,
  "workflows": [
    {
      "bpmnProcessId": "order-process",
      "version": 1,
      "workflowKey": 1,
      "resourceName": "order-process.bpmn"
    }
  ]
}

Now we'll take a look at the Operate user interface:

  • Go to http://localhost:8080 and use the credentials demo / demo to access Operate
  • Click on the "Running Instances" option in the navigation bar at the top of the interface
  • Select the order-process workflow from the "Workflows" selector on the left side of the screen

You should see the workflow model we just deployed – the same model we built in the previous section. You won't see any workflow instances because we haven't created them yet, and that's exactly what we'll do in the next section.

Zeebe Configuration File

Next Page: Create and Complete Instances >>

<< Previous Page: Create a Workflow

Create and Complete Workflow Instances

We're going to create 2 workflow instances for this tutorial: one with an order value less than $100 and one with an order value greater than or equal to $100 so that we can see our XOR Gateway in action.

Go back to the Terminal window where you deployed the workflow model and execute the following command.

Note: Windows users who want to execute this command using cmd or Powershell have to escape the variables differently.

  • cmd: "{\"orderId\": 1234}"
  • Powershell: '{"\"orderId"\": 1234}'

Linux

./bin/zbctl --insecure create instance order-process --variables '{"orderId": "1234", "orderValue":99}'

Mac

./bin/zbctl.darwin --insecure create instance order-process --variables '{"orderId": "1234", "orderValue":99}'

Windows (Powershell)

./bin/zbctl.exe --insecure create instance order-process --variables '{\"orderId\": \"1234\", \
"orderValue\":99}'

You'll see a response like:

{
  "workflowKey": 1,
  "bpmnProcessId": "order-process",
  "version": 1,
  "workflowInstanceKey": 8
}

This first workflow instance we just created represents a single customer order with orderId 1234 and orderValue 99 (or, $99).

In the same Terminal window, run the command:

Linux

./bin/zbctl --insecure create instance order-process --variables '{"orderId": "2345", "orderValue":100}'

Mac

./bin/zbctl.darwin --insecure create instance order-process --variables '{"orderId": "2345", "orderValue":100}'

Windows (Powershell)

./bin/zbctl.exe --insecure create instance order-process --variables '{\"orderId\": \"2345\", \
"orderValue\":100}'

This second workflow instance we just created represents a single customer order with orderId 2345 and orderValue 100 (or, $100).

If you go back to the Operate UI and refresh the page, you should now see two workflow instances (the green badge) waiting at the Initiate Payment task.

Workflow Instances in Operate

Note that the workflow instance can't move past this first task until we create a job worker to complete initiate-payment jobs. So that's exactly what we'll do next.

To make this point again: in a real-word use case, you probably won't manually create workflow instances using the Zeebe CLI. Rather, a workflow instance would be created programmatically in response to some business event, such as a message sent to Zeebe after a customer places an order. And instances might be created at very large scale if, for example, many customers were placing orders at the same time due to a sale. We're using the CLI here just for simplicity's sake.

We have two instances currently waiting at our "Initiate Payment" task, which means that Zeebe has created two jobs with type initiate-payment.

zbctl provides a command to spawn simple job workers using an external command or script. The job worker will receive the payload for every job as a JSON object on stdin and must also return its result as JSON object on stdout if it handled the job successfully.

In this example, we'll also use the unix command cat which just outputs what it receives on stdin.

Open a new Terminal tab or window, change into the Zeebe broker directory, and use the following command to create a job worker that will work on the initiate-payment job.

Note: For Windows users, this command does not work with cmd as the cat command does not exist. We recommend to use Powershell or a bash-like shell to execute this command.

Linux

./bin/zbctl --insecure create worker initiate-payment --handler cat

Mac

./bin/zbctl.darwin --insecure create worker initiate-payment --handler cat

Windows

./bin/zbctl.exe --insecure create worker initiate-payment --handler "findstr .*"

You should see a response along the lines of:

Activated job 12 with payload {"orderId":"2345","orderValue":100}
Activated job 7 with payload {"orderId":"1234","orderValue":99}
Handler completed job 12 with payload {"orderId":"2345","orderValue":100}
Handler completed job 7 with payload {"orderId":"1234","orderValue":99}

We can see that the job worker activated then completed the two available initiate-payment jobs. You can shut down the job worker if you'd like--you won't need it in the rest of the tutorial.

Now go to the browser tab where you're running Operate. You should see that the workflow instances have advanced to the Intermediate Message Catch Event and are waiting there.

Waiting at Message Event

The workflow instances will wait at the Intermediate Message Catch Event until a message is received by Zeebe and correlated to the instances. Messages can be published using Zeebe clients, and it's also possible for Zeebe to connect to a message queue such as Apache Kafka and correlate messages published there to workflow instances.

zbctl also supports message publishing, so we'll continue to use it in our demo. Below is the command we'll use to publish and correlate a message. You'll see that we provide the message "Name" that we assigned to this message event in the Zeebe Modeler as well as the orderId that we included in the payload of the instance when we created it.

Remember, orderId is the correlation key we set in the Modeler when configuring the message event. Zeebe requires both of these fields to be able to correlate a message to a workflow instance. Because we have two workflow instances with two distinct orderId, we'll need to publish two messages. Run these two commands one after the other:

Linux

./bin/zbctl --insecure publish message "payment-received" --correlationKey="1234"
./bin/zbctl --insecure publish message "payment-received" --correlationKey="2345"

Mac

./bin/zbctl.darwin --insecure publish message "payment-received" --correlationKey="1234"
./bin/zbctl.darwin --insecure publish message "payment-received" --correlationKey="2345"

Windows

./bin/zbctl.exe --insecure publish message "payment-received" --correlationKey="1234"
./bin/zbctl.exe --insecure publish message "payment-received" --correlationKey="2345"

You won't see a response in your Terminal window, but if you refresh Operate, you should see that the messages were correlated successfully and that one workflow instance has advanced to the "Ship With Insurance" task and the other has advanced to the "Ship Without Insurance" task.

Waiting at Shipping Service Tasks

The good news is that this visualization confirms that our decision logic worked as expected: our workflow instance with an orderValue of $100 will ship with insurance, and our workflow instance with an orderValue of $99 will ship without insurance.

You probably know what you need to do next. Go ahead and open a Terminal window and create a job worker for the ship-without-insurance job type.

Linux

./bin/zbctl --insecure create worker ship-without-insurance --handler cat

Mac

./bin/zbctl.darwin --insecure create worker ship-without-insurance --handler cat

Windows

./bin/zbctl.exe --insecure create worker ship-without-insurance --handler "findstr .*"

You should see a response along the lines of:

Activated job 529 with payload {"orderId":"1234","orderValue":99}
Handler completed job 529 with payload {"orderId":"1234","orderValue":99}

You can shut down this worker now.

Select the "Finished Instances" checkbox in the bottom left of Operate, refresh the page, and voila! You'll see your first completed Zeebe workflow instance.

First Workflow Instance Complete

Because the "Ship With Insurance" task has a different job type, we need to create a second worker that can take on this job.

Linux

./bin/zbctl --insecure create worker ship-with-insurance --handler cat

Mac

./bin/zbctl.darwin --insecure create worker ship-with-insurance --handler cat

Windows

./bin/zbctl.exe --insecure create worker ship-with-insurance --handler "findstr .*"

You should see a response along the lines of:

Activated job 535 with payload {"orderId":"2345","orderValue":100}
Handler completed job 535 with payload {"orderId":"2345","orderValue":100}

You can shut down this worker, too.

Let's take one more look in Operate to confirm that both workflow instances have been completed.

Both Workflow Instances Complete

Hooray! You've completed the tutorial! Congratulations.

In the next and final section, we'll point you to resources we think you'll find helpful as you continue working with Zeebe.

Next Page: Next Steps & Resources >>

<< Previous Page: Deploy a Workflow

Next Steps & Resources

Zeebe's Java and Go clients each have Getting Started guides of their own, showing in much greater detail how you can use the clients in the worker services you orchestrate with Zeebe.

Beyond Java and Go, it's possible to create clients for Zeebe in a range of other programming languages, including JavaScript and C#, via community-supported libraries. The Awesome Zeebe page includes community-contributed clients in other languages, and this blog post walks through how to generate a new client stub for Zeebe using gRPC.

The Zeebe docs (where this tutorial is located) contain resources to help you move your Zeebe project forward.

If you have questions, you can get in touch with us via the:

Please reach out if we can help you! We're here to offer support.

Lastly, we do a lot of writing about project news along with an occasional deep dive into the product in the Zeebe blog. And we usually make product announcements via Twitter and our email mailing list, which you can sign up for at the bottom of the homepage.

Thanks so much for working through this tutorial with us. We're really glad you're here, and we're happy to welcome you to the Zeebe community!

<< Previous Page: Create and Complete Instances

BPMN Workflows

Zeebe uses visual workflows based on the industry standard BPMN 2.0.

workflow

Read more about:

BPMN Primer

Business Process Model And Notation 2.0 (BPMN) is an industry standard for workflow modeling and execution. A BPMN workflow is an XML document that has a visual representation. For example, here is a BPMN workflow:

workflow

The corresponding XML

<?xml version="1.0" encoding="UTF-8"?>
<bpmn:definitions xmlns:bpmn="http://www.omg.org/spec/BPMN/20100524/MODEL" xmlns:bpmndi="http://www.omg.org/spec/BPMN/20100524/DI" xmlns:di="http://www.omg.org/spec/DD/20100524/DI" xmlns:dc="http://www.omg.org/spec/DD/20100524/DC" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:zeebe="http://camunda.org/schema/zeebe/1.0" id="Definitions_1" targetNamespace="http://bpmn.io/schema/bpmn" exporter="Zeebe Modeler" exporterVersion="0.1.0">
  <bpmn:process id="Process_1" isExecutable="true">
    <bpmn:startEvent id="StartEvent_1" name="Order Placed">
      <bpmn:outgoing>SequenceFlow_1bq1azi</bpmn:outgoing>
    </bpmn:startEvent>
    <bpmn:sequenceFlow id="SequenceFlow_1bq1azi" sourceRef="StartEvent_1" targetRef="Task_1f47b9v" />
    <bpmn:sequenceFlow id="SequenceFlow_09hqjpg" sourceRef="Task_1f47b9v" targetRef="Task_1109y9g" />
    <bpmn:sequenceFlow id="SequenceFlow_1ea1mpb" sourceRef="Task_1109y9g" targetRef="Task_00moy91" />
    <bpmn:endEvent id="EndEvent_0a27csw" name="Order Delivered">
      <bpmn:incoming>SequenceFlow_0ojoaqz</bpmn:incoming>
    </bpmn:endEvent>
    <bpmn:sequenceFlow id="SequenceFlow_0ojoaqz" sourceRef="Task_00moy91" targetRef="EndEvent_0a27csw" />
    <bpmn:serviceTask id="Task_1f47b9v" name="Collect Money">
      <bpmn:extensionElements>
        <zeebe:taskDefinition type="collect-money" retries="3" />
      </bpmn:extensionElements>
      <bpmn:incoming>SequenceFlow_1bq1azi</bpmn:incoming>
      <bpmn:outgoing>SequenceFlow_09hqjpg</bpmn:outgoing>
    </bpmn:serviceTask>
    <bpmn:serviceTask id="Task_1109y9g" name="Fetch Items">
      <bpmn:extensionElements>
        <zeebe:taskDefinition type="fetch-items" retries="3" />
      </bpmn:extensionElements>
      <bpmn:incoming>SequenceFlow_09hqjpg</bpmn:incoming>
      <bpmn:outgoing>SequenceFlow_1ea1mpb</bpmn:outgoing>
    </bpmn:serviceTask>
    <bpmn:serviceTask id="Task_00moy91" name="Ship Parcel">
      <bpmn:extensionElements>
        <zeebe:taskDefinition type="ship-parcel" retries="3" />
      </bpmn:extensionElements>
      <bpmn:incoming>SequenceFlow_1ea1mpb</bpmn:incoming>
      <bpmn:outgoing>SequenceFlow_0ojoaqz</bpmn:outgoing>
    </bpmn:serviceTask>
  </bpmn:process>
  <bpmndi:BPMNDiagram id="BPMNDiagram_1">
    <bpmndi:BPMNPlane id="BPMNPlane_1" bpmnElement="Process_1">
      <bpmndi:BPMNShape id="_BPMNShape_StartEvent_2" bpmnElement="StartEvent_1">
        <dc:Bounds x="191" y="102" width="36" height="36" />
        <bpmndi:BPMNLabel>
          <dc:Bounds x="175" y="138" width="68" height="12" />
        </bpmndi:BPMNLabel>
      </bpmndi:BPMNShape>
      <bpmndi:BPMNEdge id="SequenceFlow_1bq1azi_di" bpmnElement="SequenceFlow_1bq1azi">
        <di:waypoint xsi:type="dc:Point" x="227" y="120" />
        <di:waypoint xsi:type="dc:Point" x="280" y="120" />
        <bpmndi:BPMNLabel>
          <dc:Bounds x="253.5" y="99" width="0" height="12" />
        </bpmndi:BPMNLabel>
      </bpmndi:BPMNEdge>
      <bpmndi:BPMNEdge id="SequenceFlow_09hqjpg_di" bpmnElement="SequenceFlow_09hqjpg">
        <di:waypoint xsi:type="dc:Point" x="380" y="120" />
        <di:waypoint xsi:type="dc:Point" x="440" y="120" />
        <bpmndi:BPMNLabel>
          <dc:Bounds x="410" y="99" width="0" height="12" />
        </bpmndi:BPMNLabel>
      </bpmndi:BPMNEdge>
      <bpmndi:BPMNEdge id="SequenceFlow_1ea1mpb_di" bpmnElement="SequenceFlow_1ea1mpb">
        <di:waypoint xsi:type="dc:Point" x="540" y="120" />
        <di:waypoint xsi:type="dc:Point" x="596" y="120" />
        <bpmndi:BPMNLabel>
          <dc:Bounds x="568" y="99" width="0" height="12" />
        </bpmndi:BPMNLabel>
      </bpmndi:BPMNEdge>
      <bpmndi:BPMNShape id="EndEvent_0a27csw_di" bpmnElement="EndEvent_0a27csw">
        <dc:Bounds x="756" y="102" width="36" height="36" />
        <bpmndi:BPMNLabel>
          <dc:Bounds x="734" y="142" width="81" height="12" />
        </bpmndi:BPMNLabel>
      </bpmndi:BPMNShape>
      <bpmndi:BPMNEdge id="SequenceFlow_0ojoaqz_di" bpmnElement="SequenceFlow_0ojoaqz">
        <di:waypoint xsi:type="dc:Point" x="696" y="120" />
        <di:waypoint xsi:type="dc:Point" x="756" y="120" />
        <bpmndi:BPMNLabel>
          <dc:Bounds x="726" y="99" width="0" height="12" />
        </bpmndi:BPMNLabel>
      </bpmndi:BPMNEdge>
      <bpmndi:BPMNShape id="ServiceTask_0lao700_di" bpmnElement="Task_1f47b9v">
        <dc:Bounds x="280" y="80" width="100" height="80" />
      </bpmndi:BPMNShape>
      <bpmndi:BPMNShape id="ServiceTask_0eetpqx_di" bpmnElement="Task_1109y9g">
        <dc:Bounds x="440" y="80" width="100" height="80" />
      </bpmndi:BPMNShape>
      <bpmndi:BPMNShape id="ServiceTask_09won99_di" bpmnElement="Task_00moy91">
        <dc:Bounds x="596" y="80" width="100" height="80" />
      </bpmndi:BPMNShape>
    </bpmndi:BPMNPlane>
  </bpmndi:BPMNDiagram>
</bpmn:definitions>

This duality makes BPMN very powerful. The XML document contains all the necessary information to be interpreted by workflow engines and modeling tools like Zeebe. At the same time, the visual representation contains just enough information to be quickly understood by humans, even when they are non-technical people. The BPMN model is source code and documentation in one artifact.

The following is an introduction to BPMN 2.0, its elements and their execution semantics. It tries to briefly provide an intuitive understanding of BPMN's power but does not cover the entire feature set. For more exhaustive BPMN resources, see the reference links at the end of this section.

Modeling BPMN Diagrams

The best tool for modeling BPMN diagrams for Zeebe is the Zeebe Modeler.

overview

BPMN Elements

Sequence Flow: Controlling the Flow of Execution

A core concept of BPMN is a sequence flow that defines the order in which steps in the workflow happen. In BPMN's visual representation, a sequence flow is an arrow connecting two elements. The direction of the arrow indicates their order of execution.

workflow

You can think of workflow execution as tokens running through the workflow model. When a workflow is started, a token is spawned at the beginning of the model. It advances with every completed step. When the token reaches the end of the workflow, it is consumed and the workflow instance ends. Zeebe's task is to drive the token and to make sure that the job workers are invoked whenever necessary.

Tasks: Units of Work

The basic elements of BPMN workflows are tasks, atomic units of work that are composed to create a meaningful result. Whenever a token reaches a task, the token stops and Zeebe creates a job and notifies a registered worker to perform work. When that handler signals completion, then the token continues on the outgoing sequence flow.

Choosing the granularity of a task is up to the person modeling the workflow. For example, the activity of processing an order can be modeled as a single Process Order task, or as three individual tasks Collect Money, Fetch Items, Ship Parcel. If you use Zeebe to orchestrate microservices, one task can represent one microservice invocation.

See the Tasks section on which types of tasks are currently supported and how to use them.

Gateways: Steering Flow

Gateways are elements that route tokens in more complex patterns than plain sequence flow.

BPMN's exclusive gateway chooses one sequence flow out of many based on data:

BPMN's parallel gateway generates new tokens by activating multiple sequence flows in parallel:

See the Gateways section on which types of gateways are currently supported and how to use them.

Events: Waiting for Something to Happen

Events in BPMN represent things that happen. A workflow can react to events (catching event) as well as emit events (throwing event). For example:

The circle with the envelope symbol is a catching message event. It makes the token continue as soon as a message is received. The XML representation of the workflow contains the criteria for which kind of message triggers continuation.

Events can be added to the workflow in various ways. Not only can they be used to make a token wait at a certain point, but also for interrupting a token's progress.

See the Events section on which types of events are currently supported and how to use them.

Sub Processes: Grouping Elements

Sub Processes are element containers that allow defining common functionality. For example, we can attach an event to a sub process's border:

payload

When the event is triggered, the sub process is interrupted regardless which of its elements is currently active.

See the Sub Processes section on which types of sub processes are currently supported and how to use them.

Additional Resources

BPMN Coverage

Elements marked in orange are currently implemented by Zeebe.

Participants

Pool
Lane

Tasks

Service Task
User Task
Script Task
Business Rule Task
Manual Task
Receive Task
Undefined Task
Send Task
Receive Task (instantiated)

Data

Data Object
Data Store

Artifacts

Text Annotation
Group

Markers

Multi-Instance
Loop
Compensation
Ad-Hoc

Events

Type Start Intermediate End
Normal Event Sub-Process Event Sub-Process
non-interrupt
Catch Boundary Boundary
non-interrupt
Throw
None
Message
Timer
Conditional
Link
Signal
Error
Escalation
Termination
Compensation
Cancel
Multiple
Multiple Parallel

Data Flow

Every BPMN workflow instance can have one or more variables. Variables are key-value-pairs and hold the contextual data of the workflow instance that is required by job workers to do their work or to decide which sequence flows to take. They can be provided when a workflow instance is created, when a job is completed, and when a message is correlated.

data-flow

Job Workers

By default, a job worker gets all variables of a workflow instance. It can limit the data by providing a list of required variables as fetchVariables.

The worker uses the variables to do its work. When the work is done, it completes the job. If the result of the work is needed by follow-up tasks, then the worker sets the variables while completing the job. These variables are merged into the workflow instance.

job-worker

If the job worker expects the variables in a different format or under different names then the variables can be transformed by defining input mappings in the workflow. Output mappings can be used to transform the job variables before merging them into the workflow instance.

Variable Scopes vs. Token-Based Data

A workflow can have concurrent paths, for example, when using a parallel gateway. When the execution reaches the parallel gateway then new tokens are spawned which execute the following paths concurrently.

Since the variables are part of the workflow instance and not of the token, they can be read globally from any token. If a token adds a variable or modifies the value of a variable then the changes are also visible to concurrent tokens.

variable-scopes

The visibility of variables is defined by the variable scopes of the workflow.

Concurrency considerations

When multiple active activities exist in a workflow instance (i.e. there is a form of concurrent execution, e.g. usage of a parallel gateway, multiple outgoing sequence flows or a parallel multi-instance marker), you may need to take extra care in dealing with variables. When variables are altered by one activity, it might also be accessed and altered by another at the same time. Race conditions can occur in such workflows.

We recommend taking care when writing variables in a parallel flow. Make sure the variables are written to the correct variable scope using variable mappings and make sure to complete jobs and publish messages only with the minimum required variables.

These type of problems can be avoided by:

  • passing only updated variables
  • using output variable mappings to customize the variable propagation
  • using an embedded subprocess and input variable mappings to limit the visibility and propagation of variables

Additional Resources

Tasks

Currently supported elements:

Service Tasks

A service task represents a work item in the workflow with a specific type.

workflow

When a service task is entered then a corresponding job is created. The workflow instance stops at this point and waits until the job is completed.

A worker can subscribe to the job type, process the jobs and complete them using one of the Zeebe clients. When the job is completed, the service task gets completed and the workflow instance continues.

Task Definition

A service task must have a taskDefinition. It specifies the type of job which workers can subscribe to.

Optionally, a taskDefinition can specify the number of times the job is retried when a worker signals failure (default = 3).

Usually, the job type and the job retries are defined as static values (e.g. order-items) but they can also be defined as expressions (e.g. = "order-" + priorityGroup). The expressions are evaluated on activating the service task and must result in a string for the job type and a number for the retries.

Task Headers

A service task can define an arbitrary number of taskHeaders. They are static metadata that are handed to workers along with the job. The headers can be used as configuration parameters for the worker.

Variable Mappings

By default, all job variables are merged into the workflow instance. This behavior can be customized by defining an output mapping at the service task.

Input mappings can be used to transform the variables into a format that is accepted by the job worker.

Additional Resources

XML representation

A service task with a custom header:

<bpmn:serviceTask id="collect-money" name="Collect Money">
  <bpmn:extensionElements>
    <zeebe:taskDefinition type="payment-service" retries="5" />
    <zeebe:taskHeaders>
      <zeebe:header key="method" value="VISA" />
    </zeebe:taskHeaders>
  </bpmn:extensionElements>
</bpmn:serviceTask>

Using the BPMN modeler

Adding a service task:

service-task

Adding custom headers: task-headers

Adding variable mappings: variable-mappings

Workflow Lifecycle

Workflow instance records of a service task:

Intent Element Id Element Type
ELEMENT_ACTIVATING collect-money SERVICE_TASK
ELEMENT_ACTIVATED collect-money SERVICE_TASK
... ... ...
ELEMENT_COMPLETING collect-money SERVICE_TASK
ELEMENT_COMPLETED collect-money SERVICE_TASK

References:

Receive Tasks

Receive tasks are tasks which references a message. They are used to wait until a proper message is received.

Receive Tasks

When a receive task is entered then a corresponding message subscription is created. The workflow instance stops at this point and waits until the message is correlated.

A message can published using one of the Zeebe clients. When the message is correlated, the receive task gets completed and the workflow instance continues.

An alternative to receive tasks are message intermediate catch events which behaves the same but can be used together with event-based gateways.

Messages

A message can be referenced by one or more receive tasks. It must define the name of the message (e.g. Money collected) and the correlationKey expression (e.g. = orderId).

Usually, the name of the message is defined as a static value (e.g. order canceled), but it can also be defined as expression (e.g. = "order " + awaitingAction). The expression is evaluated on activating the receive task and must result in a string.

The correlationKey is an expression that usually accesses a variable of the workflow instance that holds the correlation key of the message. The expression is evaluated on activating the receive task and must result either in a string or in a number.

In order to correlate a message to the receive task, the message is published with the defined name (e.g. Money collected) and the value of the correlationKey expression. For example, if the workflow instance has a variable orderId with value "order-123" then the message must be published with the correlation key "order-123".

Variable Mappings

By default, all message variables are merged into the workflow instance. This behavior can be customized by defining an output mapping at the receive task.

Additional Resources

XML representation

A receive task with message definition:

<bpmn:message id="Message_1iz5qtq" name="Money collected">
   <bpmn:extensionElements>
     <zeebe:subscription correlationKey="orderId" />
   </bpmn:extensionElements>
</bpmn:message>

<bpmn:receiveTask id="money-collected" name="Money collected"
  messageRef="Message_1iz5qtq">
</bpmn:receiveTask>

Using the BPMN modeler

Adding a receive task with message:

receive-task

Workflow Lifecycle

Workflow instance records of a receive task:

Intent Element Id Element Type
ELEMENT_ACTIVATING money-collected RECEIVE_TASK
ELEMENT_ACTIVATED money-collected RECEIVE_TASK
... ... ...
EVENT_OCCURRED money-collected RECEIVE_TASK
ELEMENT_COMPLETING money-collected RECEIVE_TASK
ELEMENT_COMPLETED money-collected RECEIVE_TASK

References:

Gateways

Currently supported elements:

Exclusive Gateway

An exclusive gateway (aka XOR-gateway) allows to make a decision based on data (i.e. on workflow instance variables).

workflow

If an exclusive gateway has multiple outgoing sequence flows then all sequence flows, except one, must have a conditionExpression to define when the flow is taken. The gateway can have one sequence flow without conditionExpression which must be defined as the default flow.

When an exclusive gateway is entered then the conditionExpressions are evaluated. The workflow instance takes the first sequence flow that condition is fulfilled.

If no condition is fulfilled then it takes the default flow of the gateway. In case the gateway has no default flow, an incident is created.

An exclusive gateway can also be used to join multiple incoming flows to one, in order to improve the readability of the BPMN. A joining gateway has a pass-through semantic. It doesn't merge the incoming concurrent flows like a parallel gateway.

Conditions

A conditionExpression defines when a flow is taken. It is a boolean expression that can access the workflow instance variables and compare them with literals or other variables. The condition is fulfilled when the expression returns true.

Multiple boolean values or comparisons can be combined as disjunction (and) or conjunction (or).

For example:

= totalPrice > 100

= order.customer = "Paul"

= orderCount > 15 or totalPrice > 50

= valid and orderCount > 0

Additional Resources

XML representation

An exclusive gateway with two outgoing sequence flows:

<bpmn:exclusiveGateway id="exclusiveGateway" default="else" />

<bpmn:sequenceFlow id="priceGreaterThan100" name="totalPrice &#62; 100"
  sourceRef="exclusiveGateway" targetRef="shipParcelWithInsurance">
  <bpmn:conditionExpression xsi:type="bpmn:tFormalExpression">
    = totalPrice &gt; 100
  </bpmn:conditionExpression>
</bpmn:sequenceFlow>

<bpmn:sequenceFlow id="else" name="else"
  sourceRef="exclusiveGateway" targetRef="shipParcel" />

Using the BPMN modeler

Adding an exclusive gateway with two outgoing sequence flows:

exclusive-gateway

Workflow Lifecycle

Workflow instance records of an exclusive gateway:

Intent Element Id Element Type
ELEMENT_ACTIVATING shipping-gateway EXCLUSIVE_GATEWAY
ELEMENT_ACTIVATED shipping-gateway EXCLUSIVE_GATEWAY
ELEMENT_COMPLETING shipping-gateway EXCLUSIVE_GATEWAY
ELEMENT_COMPLETED shipping-gateway EXCLUSIVE_GATEWAY
SEQUENCE_FLOW_TAKEN priceGreaterThan100 SEQUENCE_FLOW

References:

Parallel Gateway

A parallel gateway (aka AND-gateway) allows to split the flow into concurrent paths.

workflow

When a parallel gateway with multiple outgoing sequence flows is entered then all flows are taken. The paths are executed concurrently and independently.

The concurrent paths can be joined using a parallel gateway with multiple incoming sequence flows. The workflow instance waits at the parallel gateway until each incoming sequence is taken.

Note the outgoing paths of the parallel gateway are executed concurrently - and not parallel in the sense of parallel threads. All records of a workflow instance are written to the same partition (single stream processor).

Additional Resources

XML representation

A parallel gateway with two outgoing sequence flows:

<bpmn:parallelGateway id="split" />

<bpmn:sequenceFlow id="to-ship-parcel" sourceRef="split" 
  targetRef="shipParcel" />

<bpmn:sequenceFlow id="to-process-payment" sourceRef="split" 
  targetRef="processPayment" />

Using the BPMN modeler

Adding a parallel gateway with two outgoing sequence flows:

parallel-gateway

Workflow Lifecycle

Workflow instance records of a parallel gateway:

Intent Element Id Element Type
ELEMENT_ACTIVATING split PARALLEL_GATEWAY
ELEMENT_ACTIVATED split PARALLEL_GATEWAY
ELEMENT_COMPLETING split PARALLEL_GATEWAY
ELEMENT_COMPLETED split PARALLEL_GATEWAY
SEQUENCE_FLOW_TAKEN to-ship-parcel SEQUENCE_FLOW
SEQUENCE_FLOW_TAKEN to-process-payment SEQUENCE_FLOW
... ... ...
SEQUENCE_FLOW_TAKEN to-join-1 SEQUENCE_FLOW
... ... ...
SEQUENCE_FLOW_TAKEN to-join-2 SEQUENCE_FLOW
ELEMENT_ACTIVATING join PARALLEL_GATEWAY
ELEMENT_ACTIVATED join PARALLEL_GATEWAY
ELEMENT_COMPLETING join PARALLEL_GATEWAY
ELEMENT_COMPLETED join PARALLEL_GATEWAY

Event-Based Gateway

An event-based gateway allows to make a decision based on events.

workflow

An event-based gateway must have at least two outgoing sequence flows. Each sequence flow must to be connected to an intermediate catch event of type timer or message.

When an event-based gateway is entered then the workflow instance waits at the gateway until one of the events is triggered. When the first event is triggered then the outgoing sequence flow of this event is taken. No other events of the gateway can be triggered afterward.

Additional Resources

XML representation

An event-based gateway with two outgoing sequence flows:

<bpmn:eventBasedGateway id="gateway" />

<bpmn:sequenceFlow id="s1" sourceRef="gateway" targetRef="payment-details-updated" />

<bpmn:intermediateCatchEvent id="payment-details-updated" 
  name="Payment Details Updated">
  <bpmn:messageEventDefinition messageRef="message-payment-details-updated" />
</bpmn:intermediateCatchEvent>

<bpmn:sequenceFlow id="s2" sourceRef="gateway" targetRef="wait-one-hour" />

<bpmn:intermediateCatchEvent id="wait-one-hour" name="1 hour">
  <bpmn:timerEventDefinition>
    <bpmn:timeDuration>PT1H</bpmn:timeDuration>
  </bpmn:timerEventDefinition>
</bpmn:intermediateCatchEvent>

Using the BPMN modeler

Adding an event-based gateway with two outgoing sequence flows:

event-based-gateway

Workflow Lifecycle

Workflow instance records of an event-based gateway:

Intent Element Id Element Type
ELEMENT_ACTIVATING gateway EVENT_BASED_GATEWAY
ELEMENT_ACTIVATED gateway EVENT_BASED_GATEWAY
... ... ...
EVENT_OCCURRED gateway EVENT_BASED_GATEWAY
ELEMENT_COMPLETING gateway EVENT_BASED_GATEWAY
ELEMENT_COMPLETED gateway EVENT_BASED_GATEWAY
ELEMENT_ACTIVATING payment-details-updated INTERMEDIATE_CATCH_EVENT

References:

Events

Currently supported events:

Events in General

Events in BPMN can be thrown (i.e. sent), or caught (i.e. received), respectively referred to as throw or catch events, e.g. message throw event, timer catch event.

Additionally, a distinction is made between start, intermediate, and end events:

  • Start events (catch events, as they can only react to something) are used to denote the beginning of a process or sub-process.
  • End events (throw events, as they indicate something has happened) are used to denote the end of a particular sequence flow.
  • Intermediate events can be used to indicate that something has happened (i.e. intermediate throw events), or to wait and react to certain events (i.e. intermediate catch events).

Intermediate catch events can be inserted into your process in two different contexts: normal flow, or attached to an activity, and are called boundary events.

Intermediate Events

In normal flow, an intermediate throw event will execute its event (e.g. send a message) once the token has reached it, and once done the token will continue to all outgoing sequence flows.

An intermediate catch event, however, will stop the token, and wait until the event it is waiting for happens, at which execution will resume, and the token will move on.

Boundary events

Boundary events provide a way to model what should happen if an event occurs while an activity is currently active. For example, if a process is waiting on a user task to happen which is taking too long, an intermediate timer catch event can be attached to the task, with an outgoing sequence flow to notification task, allowing the modeler to automate sending a reminder email to the user.

A boundary event must be an intermediate catch event, and can be either interrupting or non-interrupting. Interrupting in this case means that once triggered, before taking any outgoing sequence flow, the activity the event is attached to will be terminated. This allows modeling timeouts where we want to prune certain execution paths if something happens, e.g. the process takes too long.

None Events

None events are unspecified events, also called ‘blank’ events.

workflow

None Start Events

A workflow can have at most one none start event (besides other types of start events).

A none start event is where the workflow instance or a subprocess starts when the workflow or the subprocess is activated.

None End Events

A workflow or subprocess can have multiple none end events. When a none end event is entered then the current execution path ends. If the workflow instance or subprocess has no more active execution paths then it is completed.

If an activity has no outgoing sequence flow then it behaves the same as it would be connected to a none end event. When the activity is completed then the current execution path ends.

Additional Resources

XML representation

A none start event:

<bpmn:startEvent id="order-placed" name="Order Placed" />

A none end event:

<bpmn:endEvent id="order-delivered" name="Order Delivered" />

Using the BPMN modeler

Adding a none start event:

start-event

Adding a none end event:

end-event

Workflow Lifecycle

Workflow instance records of a none start event:

Intent Element Id Element Type
ELEMENT_ACTIVATING order-placed START_EVENT
ELEMENT_ACTIVATED order-placed START_EVENT
ELEMENT_COMPLETING order-placed START_EVENT
ELEMENT_COMPLETED order-placed START_EVENT

Workflow instance records of a none end event:

Intent Element Id Element Type
ELEMENT_ACTIVATING order-delivered END_EVENT
ELEMENT_ACTIVATED order-delivered END_EVENT
ELEMENT_COMPLETING order-delivered END_EVENT
ELEMENT_COMPLETED order-delivered END_EVENT

Message Events

Message events are events which reference a message. They are used to wait until a proper message is received.

workflow

At the moment, messages can be published only externally by using one of the Zeebe clients.

Message Start Events

A workflow can have one or more message start events (besides other types of start events). Each of the message events must have a unique message name.

When a workflow is deployed then it creates a message subscription for each message start event. Message subscriptions of the previous version of the workflow (based on the BPMN process id) are closed.

When the message subscription is created then a message can be correlated to the start event if the message name matches. On correlating the message, a new workflow instance is created and the corresponding message start event is activated.

Messages are not correlated if they were published before the workflow was deployed. Or, if a new version of the workflow is deployed which doesn't have a proper start event.

The correlationKey of a published message can be used to control the workflow instance creation. If an instance of this workflow is active (independently from its version) and it was triggered by a message with the same correlationKey then the message is not correlated and no new instance is created. When the active workflow instance is ended (completed or terminated) and a message with the same correlationKey and a matching message name is buffered (i.e. TTL > 0) then this message is correlated and a new instance of the latest version of the workflow is created.

If the correlationKey of a message is empty then it will always create a new workflow instance and does not check if an instance is already active.

Intermediate Message Catch Events

When an intermediate message catch event is entered then a corresponding message subscription is created. The workflow instance stops at this point and waits until the message is correlated. When a message is correlated, the catch event gets completed and the workflow instance continues.

An alternative to intermediate message catch events are receive tasks which behaves the same but can be used together with boundary events.

Message Boundary Events

An activity can have one or more message boundary events. Each of the message events must have a unique message name.

When the activity is entered then it creates a corresponding message subscription for each boundary message event. If a non-interrupting boundary event is triggered then the activity is not terminated and multiple messages can be correlated.

Messages

A message can be referenced by one or more message events. It must define the name of the message (e.g. Money collected) and the correlationKey expression (e.g. = orderId). If the message is only referenced by message start events then the correlationKey is not required.

Usually, the name of the message is defined as a static value (e.g. order canceled), but it can also be defined as expression (e.g. = "order " + awaitingAction). If the expression belongs to a message start event of the workflow, then it is evaluated on deploying the workflow. Otherwise, it is evaluated on activating the message event. The evaluation must result in a string.

The correlationKey is an expression that usually accesses a variable of the workflow instance that holds the correlation key of the message. The expression is evaluated on activating the message event and must result either in a string or in a number.

In order to correlate a message to the message event, the message is published with the defined name (e.g. Money collected) and the value of the correlationKey expression. For example, if the workflow instance has a variable orderId with value "order-123" then the message must be published with the correlation key "order-123".

Variable Mappings

By default, all message variables are merged into the workflow instance. This behavior can be customized by defining an output mapping at the message catch event.

Additional Resources

XML representation

A message start event with message definition:

<bpmn:message id="Message_0z0aft4" name="order-placed" />

<bpmn:startEvent id="order-placed" name="Order placed">
  <bpmn:messageEventDefinition messageRef="Message_0z0aft4" />
</bpmn:startEvent>

An intermediate message catch event with message definition:

<bpmn:message id="Message_1iz5qtq" name="money-collected">
  <bpmn:extensionElements>
    <zeebe:subscription correlationKey="= orderId" />
  </bpmn:extensionElements>
</bpmn:message>

<bpmn:intermediateCatchEvent id="money-collected" name="Money collected" >
  <bpmn:messageEventDefinition messageRef="Message_1iz5qtq" />
</bpmn:intermediateCatchEvent>

A boundary message event:

<bpmn:boundaryEvent id="order-canceled" name="Order Canceled"
  attachedToRef="collect-money">
  <bpmn:messageEventDefinition messageRef="Message_1iz5qtq" />
</bpmn:boundaryEvent>

Using the BPMN modeler

Adding an intermediate message catch event:

message-event

Workflow Lifecycle

Workflow instance records of a message start event:

Intent Element Id Element Type
EVENT_OCCURRED order-placed START_EVENT
ELEMENT_ACTIVATING order-placed START_EVENT
ELEMENT_ACTIVATED order-placed START_EVENT
ELEMENT_COMPLETING order-placed START_EVENT
ELEMENT_COMPLETED order-placed START_EVENT

Workflow instance records of an intermediate message catch event:

Intent Element Id Element Type
ELEMENT_ACTIVATING order-delivered INTERMEDIATE_CATCH_EVENT
ELEMENT_ACTIVATED order-delivered INTERMEDIATE_CATCH_EVENT
... ... ...
EVENT_OCCURRED money-collected INTERMEDIATE_CATCH_EVENT
ELEMENT_COMPLETING money-collected INTERMEDIATE_CATCH_EVENT
ELEMENT_COMPLETED money-collected INTERMEDIATE_CATCH_EVENT

References:

Timer Events

Timer events are events which are triggered by a defined timer.

workflow

Timer Start Events

A workflow can have one or more timer start events (besides other types of start events). Each of the timer events must have either a time date or time cycle definition.

When a workflow is deployed then it schedules a timer for each timer start event. Scheduled timers of the previous version of the workflow (based on the BPMN process id) are canceled.

When a timer is triggered then a new workflow instance is created and the corresponding timer start event is activated.

Intermediate Timer Catch Events

An intermediate timer catch event must have a time duration definition that defines when it is triggered.

When an intermediate timer catch event is entered then a corresponding timer is scheduled. The workflow instance stops at this point and waits until the timer is triggered. When the timer is triggered, the catch event gets completed and the workflow instance continues.

Timer Boundary Events

An interrupting timer boundary event must have a time duration definition. When the corresponding timer is triggered then the activity gets terminated. Interrupting timer boundary events is often used to model timeouts, for example, canceling the processing after 5 minutes and do something else.

An non-interrupting timer boundary event must have either a time duration or time cycle definition. When the activity is entered then it schedules a corresponding timer. If the timer is triggered and it is defined as time cycle with repetitions > 0 then it schedules the timer again until the defined number of repetitions is reached. Non-interrupting timer boundary events is often used to model notifications, for example, contacting the support if the processing takes longer than one hour.

Timers

Timers must be defined by providing either a date, a duration, or a cycle.

A timer can be defined either as a static value (e.g. PT3D) or as an expression. There are two common ways for using an expression:

If the expression belongs to a timer start event of the workflow then it is evaluated on deploying the workflow. Otherwise, it is evaluated on activating the timer catch event. The evaluation must result either in a string that has the same ISO 8601 format as the static value or an equivalent temporal value (i.e. a date-time, a duration, or a cycle).

Time Date

A specific point in time defined as ISO 8601 combined date and time representation. It must contain a timezone information, either Z for UTC or a zone offset. Optionally, it can contain a zone id.

  • 2019-10-01T12:00:00Z - UTC time
  • 2019-10-02T08:09:40+02:00 - UTC plus 2 hours zone offset
  • 2019-10-02T08:09:40+02:00[Europe/Berlin] - UTC plus 2 hours zone offset at Berlin

Time Duration

A duration defined as ISO 8601 durations format.

  • PT15S - 15 seconds
  • PT1H30M - 1 hour and 30 minutes
  • P14D - 14 days

If the duration is zero or negative then the timer will fire immediately.

Time Cycle

A cycle defined as ISO 8601 repeating intervals format. It contains the duration and the number of repetitions. If the repetitions are not defined then the timer will be repeated infinitely until it is canceled.

  • R5/PT10S - every 10 seconds, up to 5 times
  • R/P1D - every day, infinitely

Additional Resources

XML representation

A timer start event with time date:

 <bpmn:startEvent id="release-date">
  <bpmn:timerEventDefinition>
    <bpmn:timeDate>2019-10-01T12:00:00Z</bpmn:timeDate>
  </bpmn:timerEventDefinition>
</bpmn:startEvent>

An intermediate timer catch event with time duration:

<bpmn:intermediateCatchEvent id="coffee-break">
  <bpmn:timerEventDefinition>
    <bpmn:timeDuration>PT10M</bpmn:timeDuration>
  </bpmn:timerEventDefinition>
</bpmn:intermediateCatchEvent>

A non-interrupting boundary timer event with time cycle:

<bpmn:boundaryEvent id="reminder" cancelActivity="false" attachedToRef="process-order">
  <bpmn:timerEventDefinition>
    <bpmn:timeCycle>R3/PT1H</bpmn:timeCycle>
  </bpmn:timerEventDefinition>
</bpmn:boundaryEvent>

Using the BPMN modeler

Adding an interrupting timer boundary event:

message-event

Workflow Lifecycle

Workflow instance records of a timer start event:

Intent Element Id Element Type
EVENT_OCCURRED release-date START_EVENT
ELEMENT_ACTIVATING release-date START_EVENT
ELEMENT_ACTIVATED release-date START_EVENT
ELEMENT_COMPLETING release-date START_EVENT
ELEMENT_COMPLETED release-date START_EVENT

Workflow instance records of an intermediate timer catch event:

Intent Element Id Element Type
ELEMENT_ACTIVATING coffee-break INTERMEDIATE_CATCH_EVENT
ELEMENT_ACTIVATED coffee-break INTERMEDIATE_CATCH_EVENT
... ... ...
EVENT_OCCURRED coffee-break INTERMEDIATE_CATCH_EVENT
ELEMENT_COMPLETING coffee-break INTERMEDIATE_CATCH_EVENT
ELEMENT_COMPLETED coffee-break INTERMEDIATE_CATCH_EVENT

References:

Error Events

Error events are events which reference an error. They are used to handle business errors in a workflow.

workflow

An error indicates that some kind of business error has occurred which should be handled in the workflow, for example, by taking a different path to compensate the error.

Defining the Error

An error can be referenced by one or more error events. It must define the errorCode (e.g. Invalid Credit Card) of the error.

The errorCode is a string that must match to the error code that is sent by the client command or from the error end event.

Catching the Error

An error can be caught using an error boundary event or an error event subprocess.

The boundary event or the event subprocess must be interrupting. When the error is caught then the service task gets terminated and the boundary event or event subprocess gets activated. That means the workflow instance continues where the error is caught instead of following the regular path.

An error is caught by the first event in the scope hierarchy that matches the error code. If the error is thrown form a service task then it can be caught by an attached boundary event. If the task has no boundary event or the error code does not match then the error is propagated to the parent or root scope of the workflow instance.

In case the workflow instance is created via call activity, the error can also be caught in the calling parent workflow instance.

Throwing the Error

An error can be thrown from a client command while processing a job. See the gRPC command for details.

Alternatively, an error can also be thrown inside a workflow using an error end event.

workflow

Unhandled Errors

When an error is triggered then it should be handled in the workflow. If it is not handled (e.g. unexpected error code) then an incident is raised to indicate the failure. The incident is attached to the corresponding service task of the processed job or the error end event.

The incident can not be solved by the user because the failure is in the workflow itself that can not be changed to handle the error for this workflow instance.

Business Error vs. Technical Error

While processing a job, two different types of errors can be occurred: a technical error (e.g. database connection interrupted) and a business error (e.g. invalid credit card).

A technical error is usually unexpected and should not be handled in the workflow. The error may disappear when the job is retried, or an incident is created to indicate that an user interaction is required.

A business error is expected and is handled in the workflow. The workflow may take a different path to compensate the error or undo previous actions.

Additional Resources

XML representation

A boundary error event:

<bpmn:error id="invalid-credit-card-error" errorCode="Invalid Credit Card" />

<bpmn:boundaryEvent id="invalid-credit-card" name="Invalid Credit Card" attachedToRef="collect-money">
 <bpmn:errorEventDefinition errorRef="invalid-credit-card-error" />
</bpmn:boundaryEvent>

Using the BPMN modeler

Adding an error boundary event:

bpmn-modeler

Workflow Lifecycle

Workflow instance records of an error boundary event:

Intent Element Id Element Type
EVENT_OCCURRED collect-money SERVICE_TASK
ELEMENT_TERMINATING collect-money SERVICE_TASK
ELEMENT_TERMINATED collect-money SERVICE_TASK
ELEMENT_ACTIVATING invalid-credit-card BOUNDARY_EVENT
ELEMENT_ACTIVATED invalid-credit-card BOUNDARY_EVENT
ELEMENT_COMPLETING invalid-credit-card BOUNDARY_EVENT
ELEMENT_COMPLETED invalid-credit-card BOUNDARY_EVENT

References:

Subprocesses

Currently supported elements:

Embedded Subprocess

An embedded subprocess allows to group elements of the workflow.

embedded-subprocess

An embedded subprocess must have exactly one none start event. Other start events are not allowed.

When an embedded subprocess is entered then the start event gets activated. The subprocess stays active as long as one containing element is active. When the last element is completed then the subprocess gets completed and the outgoing sequence flow is taken.

Embedded subprocesses are often used together with boundary events. One or more boundary events can be attached to an subprocess. When an interrupting boundary event is triggered then the whole subprocess including all active elements gets terminated.

Variable Mappings

Input mappings can be used to create new local variables in the scope of the subprocess. These variables are only visible within the subprocess.

By default, the local variables of the subprocess are not propagated (i.e. they are removed with the scope). This behavior can be customized by defining output mappings at the subprocess. The output mappings are applied on completing the subprocess.

Additional Resources

XML representation

An embedded subprocess with a start event:

<bpmn:subProcess id="process-order" name="Process Order">
  <bpmn:startEvent id="order-placed" />
  ... more contained elements ...
</bpmn:subProcess>

Using the BPMN modeler

Adding an embedded subprocess:

event-based-gateway

Workflow Lifecycle

Workflow instance records of an embedded subprocess:

Intent Element Id Element Type
ELEMENT_ACTIVATING process-order SUB_PROCESS
ELEMENT_ACTIVATED process-order SUB_PROCESS
ELEMENT_ACTIVATING order-placed START_EVENT
... ... ...
ELEMENT_COMPLETED items-fetched END_EVENT
ELEMENT_COMPLETING process-order SUB_PROCESS
ELEMENT_COMPLETED process-order SUB_PROCESS

References:

Call Activities

A call activity (aka reusable subprocess) allows to call/invoke another workflow as part of this workflow. It is similar to an embedded subprocess but the workflow is externalized (i.e. stored as separated BPMN) and can be invoked by different workflows.

call-activity

When a call activity is entered then a new workflow instance of the referenced workflow is created. The new workflow instance gets activated at the none start event. The workflow can have start events of other types but they are ignored.

When the created workflow instance is completed then the call activity is left and the outgoing sequence flow is taken.

Defining the Called Workflow

A call activity must define the BPMN process id of the called workflow as processId.

The new instance of the defined workflow will be created of its latest version - at the point when the call activity is activated.

Usually, the processId is defined as a static value (e.g. shipping-process) but it can also be defined as expression (e.g. = "shipping-" + tenantId). The expression is evaluated on activating the call activity and must result in a string.

Boundary Events

call-activity-boundary-event

Interrupting and non-interrupting boundary events can be attached to a call activity.

When an interrupting boundary event is triggered then the call activity and the created workflow instance are terminated. The variables of the created workflow instance are not propagated to the call activity.

When an non-interrupting boundary event is triggered then the created workflow instance is not affected. The activities at the outgoing path have no access to the variables of the created workflow instance since they are bounded to the other workflow instance.

Variable Mappings

When the call activity is activated then all variables of the call activity scope are copied to the created workflow instance.

Input mappings can be used to create new local variables in the scope of the call activity. These variables are also copied to the created workflow instance.

If the attribute propagateAllChildVariables is set (default: true) then all variables of the created workflow instance are propagated to the call activity. This behavior can be customized by defining output mappings at the call activity. The output mappings are applied on completing the call activity and only those variables that are defined in the output mappings are propagated.

It is recommended to disable the attribute propagateAllChildVariables or define output mappings if the call activity is in a parallel flow (e.g. when it is marked as parallel multi-instance). Otherwise, it can happen that variables are overridden accidentally when they are changed in the parallel flow.

Additional Resources

XML representation

A call activity with static process id:

<bpmn:callActivity id="task-A" name="A">
  <bpmn:extensionElements>
    <zeebe:calledElement processId="child-process-id" />
  </bpmn:extensionElements>
</bpmn:callActivity>

Using the BPMN modeler

Adding a call activity with static process id:

call-activity

Workflow Lifecycle

Workflow instance records of a call activity:

Intent Element Id Element Type
ELEMENT_ACTIVATING task-a CALL_ACTIVITY
ELEMENT_ACTIVATED task-a CALL_ACTIVITY
ELEMENT_ACTIVATING child-process-id PROCESS
ELEMENT_ACTIVATED child-process-id PROCESS
... ... ...
ELEMENT_COMPLETED child-process-id PROCESS
ELEMENT_COMPLETING task-a CALL_ACTIVITY
ELEMENT_COMPLETED task-a CALL_ACTIVITY

The workflow instance records of the created workflow instance have a reference to its parent workflow instance (parentWorkflowInstanceKey) and the element instance of the call activity (parentElementInstanceKey).

References:

Event Subprocess

An event subprocess is a subprocess that is triggered by an event. It can be added globally to the process or locally inside an embedded subprocess.

event-subprocess

An event subprocess must have exactly one start event of one of the following types:

An event subprocess behaves like a boundary event but is inside the scope instead of being attached to the scope. Like a boundary event, the event subprocess can be interrupting or non-interrupting (indicated in BPMN by a solid or dashed border of the start event). The start event of the event subprocess can be triggered when its containing scope is activated.

A non-interrupting event subprocess can be triggered multiple times. An interrupting event subprocess can be triggered only once.

When an interrupting event subprocess is triggered then all active instances of its containing scope are terminated, including instances of other non-interrupting event subprocesses.

If an event subprocess is triggered then its containing scope is not completed until the triggered instance is completed.

Variables

Unlike a boundary event, an event subprocess is inside the scope. So, it can access and modify all local variables of its containing scope. This is not possible with a boundary event because a boundary event is outside of the scope.

Input mappings can be used to create new local variables in the scope of the event subprocess. These variables are only visible within the event subprocess.

By default, the local variables of the event subprocess are not propagated (i.e. they are removed with the scope). This behavior can be customized by defining output mappings at the event subprocess. The output mappings are applied on completing the event subprocess.

Additional Resources

XML representation

An event subprocess with an interrupting timer start event:

<bpmn:subProcess id="compensate-subprocess" triggeredByEvent="true">
  <bpmn:startEvent id="cancel-order" isInterrupting="true">
    <bpmn:timerEventDefinition>
      <bpmn:timeDuration>PT5M</bpmn:timeDuration>
    </bpmn:timerEventDefinition>
  ... other elements
</bpmn:subProcess>

Using the BPMN modeler

Adding an event subprocess with an interrupting timer start event:

event-subprocess

Workflow Lifecycle

Workflow instance records of an event subprocess with an interrupting timer start event:

Intent Element Id Element Type
EVENT_OCCURRED five-minutes START_EVENT
ELEMENT_TERMINATING fetch-item SERVICE_TASK
... ... ...
ELEMENT_TERMINATED fetch-item SERVICE_TASK
ELEMENT_ACTIVATING compensate-subprocess SUB_PROCESS
ELEMENT_ACTIVATED compensate-subprocess SUB_PROCESS
ELEMENT_ACTIVATING five-minutes START_EVENT
... ... ...
ELEMENT_COMPLETED order-cancelled END_EVENT
ELEMENT_COMPLETING compensate-subprocess SUB_PROCESS
ELEMENT_COMPLETED compensate-subprocess SUB_PROCESS
ELEMENT_COMPLETING order-process PROCESS
ELEMENT_COMPLETED order-process PROCESS

References:

Markers

Currently supported markers:

Multi-Instance

The following activities can be marked as multi-instance:

A multi-instance activity is executed multiple times - once for each element of a given collection (like a foreach loop in a programming language).

multi-instance

On the execution level, a multi-instance activity has two parts: a multi-instance body and an inner activity. The multi-instance body is the container for all instances of the inner activity.

When the activity is entered, the multi-instance body is activated and one instance for every element of the inputCollection is created (sequentially or in parallel). When all instances are completed, the body is completed and the activity is left.

Sequential vs. Parallel

A multi-instance activity is executed either sequentially or in parallel (default). In the BPMN, a sequential multi-instance activity is displayed with 3 horizontal lines at the bottom. A parallel one with 3 vertical lines.

In case of a sequential multi-instance activity, the instances are executed one-by-one. When one instance is completed then a new instance is created for the next element in the inputCollection.

sequential multi-instance

In case of a parallel multi-instance activity, all instances are created when the multi-instance body is activated. The instances are executed concurrently and independently from each other.

parallel multi-instance

Defining the Collection to Iterate over

A multi-instance activity must have an inputCollection expression that defines the collection to iterate over (e.g. = items). Usually, it accesses a variable of the workflow instance that holds the collection. The expression is evaluated on activating the multi-instance body. It must result in an array of any type (e.g. ["item-1", "item-2"]).

In order to access the current element of the inputCollection value within the instance, the multi-instance activity can define the inputElement variable (e.g. item). The element is stored as a local variable of the instance under the given name.

If the inputCollection value is empty then the multi-instance body is completed immediately and no instances are created. It behaves like the activity is skipped.

Collecting the Output

The output of a multi-instance activity (e.g. the result of a calculation) can be collected from the instances by defining the outputCollection and the outputElement expression.

outputCollection defines the name of the variable under which the collected output is stored (e.g. results). It is created as a local variable of the multi-instance body and gets updated when an instance is completed. When the multi-instance body is completed, the variable is propagated to its parent scope.

outputElement is an expression that defines the output of the instance (e.g. = result). Usually, it accesses a variable of the instance that holds the output value. If the expression only accesses a variable or a nested property then it is created as local variable of the instance. This variable should be updated with the output value, for example, by a job worker providing a variable with the name result. Since the variable is defined as a local variable, it is not propagated to its parent scope and is only visible within the instance.

When the instance is completed, the outputElement expression is evaluated and the result is inserted into the outputCollection at the same index as the inputElement of the inputCollection. So, the order of the outputCollection is determined and matches to the inputCollection, even for parallel multi-instance activities. If the outputElement variable is not updated then null is inserted instead.

If the inputCollection value is empty then an empty array is propagated as outputCollection.

Boundary Events

multi-instance with boundary event

Interrupting and non-interrupting boundary events can be attached to a multi-instance activity.

When an interrupting boundary event is triggered then the multi-instance body and all active instances are terminated. The outputCollection variable is not propagated to the parent scope (i.e. no partial output).

When an non-interrupting boundary event is triggered then the instances are not affected. The activities at the outgoing path have no access to the local variables since they are bounded to the multi-instance activity.

Special Multi-Instance Variables

Every instance has a local variable loopCounter. It holds the index in the inputCollection of this instance, starting with 1.

Variable Mappings

Input and output variable mappings can be defined at the multi-instance activity. They are applied on each instance on activating and on completing.

The input mappings can be used to create new local variables in the scope of an instance. These variables are only visible within the instance. It is a way to restrict the visibility of variables. By default, new variables (e.g. provided by a job worker) are created in the scope of the workflow instance and are visible to all instances of the multi-instance activity as well as outside of it. In case of a parallel multi-instance activity, this can lead to variables that are modified by multiple instances and result in race conditions. If a variable is defined as local variable, then it is not propagated to a parent or the workflow instance scope and can't be modified outside of the instance.

The input mappings can access the local variables of the instance (e.g. inputElement, loopCounter). For example, to extract parts of the inputElement variable and apply them to separate variables.

The output mappings can be used to update the outputElement variable. For example, to extract a part of the job variables.

Example: say we have a call activity that is marked as a parallel multi-instance. When the called workflow instance completes, its variables get merged into the call activity's workflow instance. Its result is collected in the output collection variable, but this has become a race condition where each completed child instance again overwrites this same variable. We end up with a corrupted output collection. An output mapping can used to overcome this, because it restricts which variables get merged. In the case that:

  • parallel multi-instance call activity
  • multi-instance output element: =output
  • variable in the child instance that holds the result: x

The output mapping on the call activity should then be:

source: =x
target: output

Additional Resources

XML representation

A sequential multi-instance service task:

<bpmn:serviceTask id="task-A" name="A">
  <bpmn:multiInstanceLoopCharacteristics>
    <bpmn:extensionElements>
      <zeebe:loopCharacteristics isSequential="true"
          inputCollection="= items" inputElement="item"
          outputCollection="results" outputElement="= result" />
    </bpmn:extensionElements>
  </bpmn:multiInstanceLoopCharacteristics>
</bpmn:serviceTask>

Using the BPMN modeler

Adding the parallel multi-instance marker to a service task:

multi-instance

Workflow Lifecycle

Workflow instance records of a parallel multi-instance service task:

Intent Element Id Element Type
ELEMENT_ACTIVATING task-a MULTI_INSTANCE_BODY
ELEMENT_ACTIVATED task-a MULTI_INSTANCE_BODY
ELEMENT_ACTIVATING task-a SERVICE_TASK
ELEMENT_ACTIVATING task-a SERVICE_TASK
ELEMENT_ACTIVATED task-a SERVICE_TASK
ELEMENT_ACTIVATED task-a SERVICE_TASK
... ... ...
ELEMENT_COMPLETED task-a SERVICE_TASK
... ... ...
ELEMENT_COMPLETED task-a SERVICE_TASK
ELEMENT_COMPLETING task-a MULTI_INSTANCE_BODY
ELEMENT_COMPLETED task-a MULTI_INSTANCE_BODY

References:

YAML Workflows

In addition to BPMN, Zeebe provides a YAML format to define workflows. Creating a YAML workflow can be done with a regular text editor and does not require a graphical modelling tool. It is inspired by imperative programming concepts and aims to be easily understandable by programmers. Internally, Zeebe transforms a deployed YAML file to BPMN.

name: order-process

tasks:
    - id: collect-money
      type: payment-service

    - id: fetch-items
      type: inventory-service

    - id: ship-parcel
      type: shipment-service

Read more about:

Tasks

A workflow can contain multiple tasks, where each represents a step in the workflow.

name: order-process

tasks:
    - id: collect-money
      type: payment-service

    - id: fetch-items
      type: inventory-service
      retries: 5

    - id: ship-parcel
      type: shipment-service
      headers:
            method: "express"
            withInsurance: false

Each task has the following properties:

  • id (required): the unique identifier of the task.
  • type (required): the name to which job workers can subscribe.
  • retries: the amount of times the job is retried in case of failure. (default = 3)
  • headers: a list of metadata in the form of key-value pairs that can be accessed by a worker.

When Zeebe executes a task, it creates a job that is handed to a job worker. The worker can perform the business logic and complete the job eventually to trigger continuation in the workflow.

Related resources:

Control Flow

Control flow is about the order in which tasks are executed. The YAML format provides tools to decide which task is executed when.

Sequences

In a sequence, a task is executed after the previous one is completed. By default, tasks are executed top-down as they are declared in the YAML file.

name: order-process

tasks:
    - id: collect-money
      type: payment-service

    - id: fetch-items
      type: inventory-service

    - id: ship-parcel
      type: shipment-service

In the example above, the workflow starts with collect-money, followed by fetch-items and ends with ship-parcel.

We can use the goto and end attributes to define a different order:

name: order-process

tasks:
    - id: collect-money
      type: payment-service
      goto: ship-parcel

    - id: fetch-items
      type: inventory-service
      end: true

    - id: ship-parcel
      type: shipment-service
      goto: fetch-items

In the above example, we have reversed the order of fetch-items and ship-parcel. Note that the end attribute is required so that workflow execution stops after fetch-items.

Data-based Conditions

Some workflows do not always execute the same tasks but need to pick and choose different tasks, based on variables of the workflow instance.

We can use the switch attribute and conditions to decide on the next task.

name: order-process

tasks:
    - id: collect-money
      type: payment-service

    - id: fetch-items
      type: inventory-service
      switch:
          - case: totalPrice > 100
            goto: ship-parcel-with-insurance

          - default: ship-parcel

    - id: ship-parcel-with-insurance
      type: shipment-service-premium
      end: true

    - id: ship-parcel
      type: shipment-service

In the above example, the order-process starts with collect-money, followed by fetch-items. If the variable totalPrice is greater than 100, then it continues with ship-parcel-with-insurance. Otherwise, ship-parcel is chosen. In either case, the workflow instance ends after that.

In the switch element, there is one case element per alternative to choose from. If none of the conditions evaluates to true, then the default element is evaluated. While default is not required, it is best practice to include to avoid errors at workflow runtime. Should such an error occur (i.e. no case is fulfilled and there is no default), then workflow execution stops and an incident is raised.

Additional Resources

Data Flow

Zeebe carries custom data from task to task in form of variables. Variables are key-value-pairs and part of the workflow instance.

By default, all job variables are merged into the workflow instance. This behavior can be customized by defining an output mapping at the task. Input mappings can be used to transform the variables into a format that is accepted by the job worker.

name: order-process

tasks:
    - id: collect-money
      type: payment-service
      inputs:
          - source: totalPrice
            target: price
      outputs:
          - source: success
            target: paymentSuccess

    - id: fetch-items
      type: inventory-service

    - id: ship-parcel
      type: shipment-service

Every mapping element has a source and a target element which must be a variable expression.

Additional Resources

Reference

This section gives in-depth explanations of Zeebe usage concepts.

Workflow Instance Creation

Depending on the workflow definition, an instance of it can be created in the following ways.

  • by a create workflow instance command
  • by an occured event (eg:- timer, message)

Create workflow instance command

A workflow instance can be created by sending a command specifying the BPMN process id or the unique key of the workflow. There are two commands to create a workflow instance.

Create and Execute Asynchronously

A workflow that has a none start event can be started explicitly using the command CreateWorkflowInstance. When the broker receives this commands, it creates a new workflow instance and immediately respond with the workflow instance id. The execution of the workflow happens after the response is send.

create-workflow

Code example

Create a workflow instance:

zbctl create instance "order-process"

Response:

{
 "workflowKey": 2251799813685249,
 "bpmnProcessId": "order-process",
 "version": 1,
 "workflowInstanceKey": 2251799813686019
}

Create and Await Results

Typically, workflow creation and execution are decoupled. However, there are use-cases that need to collect the results of a workflow when it's execution is completed. The CreateWorkflowInstanceWithResult command allows you to “synchronously” execute workflows and receive the results via a set of variables. The response is send when the workflow execution is completed.

create-workflow

Failure scenarios that are applicable to other commands are applicable to this command. Clients may not get a response in the following cases even if the workflow execution is completed successfully.

  • Leader failover: When the broker that is processing this workflow crashed, another broker continues the processing. But it does not send the response because the request is registered on the other broker.
  • Gateway failure: If the gateway to which the client is connected failed, broker cannot send the response to the client.
  • gRPC timeout: If the gRPC deadlines are not configured for long request timeout, the connection may be closed before the workflow is completed.

This command is typically useful for short running workflows and workflows that collect information. If the workflow mutates system state, or further operations rely on the workflow outcome response to the client, take care to consider and design your system for failure states and retries. Note that, when the client resend the command, it creates a new workflow instance.

Code example

Create a workflow instance and await results:

zbctl create instance "order-process" --withResult --variables '{"orderId": "1234"}'

Response: (Note that the variables in the response depends on the workflow.)

{
  "workflowKey": 2251799813685249,
  "bpmnProcessId": "order-process",
  "version": 1,
  "workflowInstanceKey": 2251799813686045,
  "variables": "{\"orderId\":\"1234\"}"
}

Workflow instance creation by events

Workflow instances are also created implicitly via various start events. Zeebe supports message start events and timer start events.

By publishing a message

A workflow with a message start event can be started by publishing a message with the name that matches the message name of the start event. For each new message a new instance is created.

Using a timer

A workflow can also have one or more timer start events. An instance of the workflow is created when the associated timer is triggered.

Distribution over partitions

When a workflow instance is created in a partition, its state is stored and managed by the same partition until its execution is terminated. The partition in which it is created is determined by various factors.

  • When a user sends a command CreateWorkflowInstance or CreateWorkflowInstanceWithResult, gateway chooses a partition in a round-robin manner and forwards the requests to that partition. The workflow instance is created in that partition.
  • When a user publishes a message, the message is forwarded to a partition based on the correlation key of the message. The workflow instance is created on the same partition where the message is published.
  • Workflow instances created by timer start events are always created on partition 1.

Workflow Lifecycles

In Zeebe, the workflow execution is represented internally by events of type WorkflowInstance. The events are written to the log stream and can be observed by an exporter.

Each event is one step in a workflow instance. All events of one workflow instance have the same workflowInstanceKey.

Events which belongs to the same element instance (e.g. a task) have the same key. The element instances have different lifecycles depending on the type of element.

(Sub-)Process/Activity/Gateway Lifecycle

activity lifecycle

Event Lifecycle

event lifecycle

Sequence Flow Lifecycle

sequence flow lifecycle

Example

order process

Intent Element Id Element Type
ELEMENT_ACTIVATING order-process process
ELEMENT_ACTIVATED order-process process
ELEMENT_ACTIVATING order-placed start event
ELEMENT_ACTIVATED order-placed start event
ELEMENT_COMPLETING order-placed start event
ELEMENT_COMPLETED order-placed start event
SEQUENCE_FLOW_TAKEN to-collect-money sequence flow
ELEMENT_ACTIVATING collect-money task
ELEMENT_ACTIVATED collect-money task
ELEMENT_COMPLETING collect-money task
ELEMENT_COMPLETED collect-money task
SEQUENCE_FLOW_TAKEN to-fetch-items sequence flow
... ... ...
SEQUENCE_FLOW_TAKEN to-order-delivered sequence flow
EVENT_ACTIVATING order-delivered end event
EVENT_ACTIVATED order-delivered end event
ELEMENT_COMPLETING order-delivered end event
ELEMENT_COMPLETED order-delivered end event
ELEMENT_COMPLETING order-placed process
ELEMENT_COMPLETED order-placed process

Variables

Variables are part of a workflow instance and represent the data of the instance. A variable has a name and a JSON value. The visibility of a variable is defined by its variable scope.

Variable Names

The name of a variable can be any alphanumeric string including the _ symbol. For a combination of words, it is recommended to use the camelCase or the snake_case format. The kebab-case format is not allowed because it contains the operator -.

When accessing a variable in an expression, keep in mind that the variable name is case-sensitive.

Restrictions of a variable name:

  • it may not start with a number
  • it may not contain whitespaces
  • it may not contain an operator (e.g. +, -, *, /, =, >, ?, .)
  • it may not be a literal (e.g. null, true, false) or a keyword (e.g. function, if, then, else, for, between, instance, of, not)

Variable Values

The value of a variable is stored as a JSON value. It must have one of the following types:

  • String
  • Number
  • Boolean
  • Array
  • Document/Object
  • Null

Variable Scopes

Variable scopes define the visibility of variables. The root scope is the workflow instance itself. Variables in this scope are visible everywhere in the workflow.

When the workflow instance enters a sub process or an activity then a new scope is created. Activities in this scope can see all variables of this and of higher scopes (i.e. parent scopes). But activities outside of this scope can not see the variables which are defined in this scopes.

If a variable has the same name as a variable from a higher scope then it covers this variable. Activities in this scope see only the value of this variable and not the one from the higher scope.

The scope of a variable is defined when the variable is created. By default, variables are created in the root scope.

Example:

variable-scopes

This workflow instance has the following variables:

  • a and b are defined on the root scope and can be seen by Task A, Task B, and Task C.
  • c is defined in the sub process scope and can be seen by Task A and Task B.
  • b is defined again on the activity scope of Task A and can be seen only by Task A. It covers the variable b from the root scope.

Variable Propagation

When variables are merged into a workflow instance (e.g. on job completion, on message correlation) then each variable is propagated from the scope of the activity to its higher scopes.

The propagation ends when a scope contains a variable with the same name. In this case, the variable value is updated.

If no scope contains this variable then it is created as a new variable in the root scope.

Example:

variable-propagation

The job of Task B is completed with the variables b, c, and d. The variables b and c are already defined in higher scopes and are updated with the new values. Variable d doesn't exist before and is created in the root scope.

Local Variables

In some cases, variables should be set in a given scope, even if they don't exist in this scope before.

In order to deactivate the variable propagation, the variables are set as local variables. That means that the variables are created or updated in the given scope, whether they exist in this scope before or not.

Input/Output Variable Mappings

Input/output variable mappings can be used to create new variables or customize how variables are merged into the workflow instance.

Variable mappings are defined in the workflow as extension elements under ioMapping. Every variable mapping has a source and a target expression.

The source expression defines the value of the mapping. Usually, it accesses a variable of the workflow instance that holds the value. If the variable or the nested property doesn't exist then an incident is created.

The target expression defines where the value of the source expression is stored. It can reference a variable by its name or a nested property of a variable. If the variable or the nested property doesn't exist then it is created.

Variable mappings are evaluated in the defined order. So, a source expression can access the target variable of a previous mapping.

Example:

variable-mappings

XML representation

<serviceTask id="collectMoney" name="Collect Money">
    <extensionElements>
      <zeebe:ioMapping>
        <zeebe:input source="= customer.name" target="sender"/>
        <zeebe:input source="= customer.iban" target="iban"/>
        <zeebe:input source="= totalPrice" target="price"/>
        <zeebe:input source="= reference" target="orderId"/>
        <zeebe:output source="= status" target="paymentStatus"/>
       </zeebe:ioMapping>
    </extensionElements>
</serviceTask>

Input Mappings

Input mappings can be used to create new variables. They can be defined on service tasks and sub processes.

When an input mapping is applied then it creates a new local variable in the scope where the mapping is defined.

Examples:

Workflow Instance Variables Input Mappings New Variables
orderId: "order-123"
source: = orderId
target: reference
reference: "order-123"
customer: {"name": "John"}
source: = customer.name
target: sender
sender: "John"
customer: "John"
iban: "DE456"
source: = customer
target: sender.name

source: = iban target: sender.iban

sender: {"name": "John",
"iban": "DE456"}

Output Mappings

Output mappings can be used to customize how job/message variables are merged into the workflow instance. They can be defined on service tasks, receive tasks, message catch events and sub processes.

If one or more output mappings are defined then the job/message variables are set as local variables in the scope where the mapping is defined. Then, the output mappings are applied to the variables and create new variables in this scope. The new variables are merged into the parent scope. If there is no mapping for a job/message variable then the variable is not merged.

If no output mappings are defined then all job/message variables are merged into the workflow instance.

In case of a sub process, the behavior is different. There are no job/message variables to be merged. But output mappings can be used to propagate local variables of the sub process to higher scopes. By default, all local variables are removed when the scope is left.

Examples:

Job/Message Variables Output Mappings Workflow Instance Variables
status: "Ok"
source: = status
target: paymentStatus
paymentStatus: "OK"
result: {"status": "Ok",
  "transactionId": "t-789"}
source: = result.status
target: paymentStatus

source: = result.transactionId target: transactionId

paymentStatus: "Ok"
transactionId: "t-789"
status: "Ok"
transactionId: "t-789"
source: = transactionId
target: order.transactionId
order: {"transactionId": "t-789"}

Expressions

Expressions can be used to access variables and calculate values dynamically.

The following attributes of BPMN elements require an expression:

Additionally, the following attributes of BPMN elements can define an expression optionally instead of a static value:

Expressions vs. Static Values

Some attributes of BPMN elements, like the timer definition of a timer catch event, can be defined either as a static value (e.g. PT2H) or as an expression (e.g. = remaingTime).

The value is identified as an expression if it starts with an equal sign = (i.e. the expression prefix). The text behind the equal sign is the actual expression. For example, = remaingTime defines the expression remaingTime that accesses a variable with the name remaingTime.

If the value doesn't have the prefix then it is used as static value. A static value is used either as a string (e.g. job type) or as a number (e.g. job retries). A string value must not be enclosed in quotes.

Note that an expression can also define a static value by using literals (e.g. "foo", 21, true, [1,2,3], {x: 22}, etc.).

The Expression Language

An expression is written in FEEL (Friendly Enough Expression Language). FEEL is part of the OMG's DMN (Decision Model and Notation) specification. It is designed to have the following properties:

  • Side-effect free
  • Simple data model with JSON-like object types: numbers, dates, strings, lists, and contexts
  • Simple syntax designed for business professionals and developers
  • Three-valued logic (true, false, null)

Zeebe integrates the Feel-Scala engine (version 1.12.x) to evaluate FEEL expressions. The following sections cover common use cases in Zeebe. A complete list of supported expressions can be found in the project's documentation.

Access Variables

A variable can be accessed by its name.

owner
// "Paul"

totalPrice
// 21.2

items
// ["item-1", "item-2", "item-3"]

If a variable is a JSON document/object then it is handled as a FEEL context. A property of the context (aka nested variable property) can be accessed by . (a dot) and the property name.

order.id
// "order-123"

order.customer.name
// "Paul"

Boolean Expressions

Values can be compared using the following operators:

Operator Description Example
= (only one equal sign) equal to owner = "Paul"
!= not equal to owner != "Paul"
< less than totalPrice < 25
<= less than or equal to totalPrice <= 25
> greater than totalPrice > 25
>= greater than or equal to totalPrice >= 25
between _ and _ same as (x >= _ and x <= _) totalPrice between 10 and 25

Multiple boolean values can be combined as disjunction (and) or conjunction (or).

orderCount >= 5 and orderCount < 15

orderCount > 15 or totalPrice > 50

Null Checks

If a variable or a nested property can be null then it can be compared to the null value. Comparing null to a value different from null results in false.

order = null
// true - if "order" is null or doesn't exist

order.id = null
// true - if "order" is null, "order" doesn't exist,
//           "id" is null, or "order" has no property "id"

In addition to the comparison with null, the built-in function is defined() can be used to differentiate between a value that is null and a value that doesn’t exist.

is defined(order)
// true - if "order" has any value or is null

is defined(order.id)
// false - if "order" doesn't exist or it has no property "id"

String Expressions

A string value must be enclosed in double quotes. Multiple string values can be concatenated using the + operator.

"foo" + "bar"
// "foobar"

Any value can be transformed into a string value using the string() function.

"order-" + string(orderId)
// "order-123"

More functions for string values are available as built-in functions (e.g. contains, matches, etc.).

Temporal Expressions

The following operators can be applied on temporal values:

Temporal Type Examples Operators
date date("2020-04-06")
  • date + duration
  • date - date
  • date - duration
  • time time("15:30:00"),
    time("15:30:00+02:00"),
    time("15:30:00@Europe/Berlin")
  • time + duration
  • time - time
  • time - duration
  • date-time date and time("2020-04-06T15:30:00"),
    date and time("2020-04-06T15:30:00+02:00"),
    date and time("2020-04-06T15:30:00@UTC")
  • date-time + duration
  • date-time - date-time
  • date-time - duration
  • duration duration("P12H"),
    duration("P4Y")
  • duration + duration
  • duration + date
  • duration + time
  • duration + date-time
  • duration - duration
  • date - duration
  • time - duration
  • date-time - duration
  • duration * number
  • duration / duration
  • duration / number
  • cycle cycle(3, duration("PT1H")),
    cycle(duration("P7D"))

    A temporal value can be compared in a boolean expression with another temporal value of the same type.

    The cycle type is different from the other temporal types because it is not supported in the FEEL type system. Instead, it is defined as a function that returns the definition of the cycle as a string in the ISO 8601 format of a recurring time interval. The function expects two arguments: the number of repetitions and the recurring interval as duration. If the first argument is null or not passed in then the interval is unbounded (i.e. infinitely repeated).

    cycle(3, duration("PT1H"))
    // "R3/PT1H"
    
    cycle(duration("P7D"))
    // "R/P7D"
    

    The current date and date-time can be accessed using the built-in functions today() and now(). In order to store the current date or date-time in a variable, it must be converted to a string using the built-in function string().

    now()
    // date and time("2020-04-06T15:30:00@UTC")
    
    today()
    // date("2020-04-06")
    
    string(today())
    // "2020-04-06"
    

    List Expressions

    An element of a list can be accessed by its index. The index starts at 1 with the first element (not at 0). A negative index starts at the end by -1. If the index is out of the range of the list then null is returned instead.

    ["a","b","c"][1]
    // "a"
    
    ["a","b","c"][2]
    // "b"
    
    ["a","b","c"][-1]
    // "c"
    

    A list value can be filtered using a boolean expression. The result is a list of elements that fulfill the condition. The current element in the condition is assigned to the variable item.

    [1,2,3,4][item > 2]
    // [3,4]
    

    The operators every and some can be used to test if all elements or at least one element of a list fulfill a given condition.

    every x in [1,2,3] satisfies x >= 2
    // false
    
    some x in [1,2,3] satisfies x > 2
    // true
    

    Invoke Functions

    FEEL defines a set of built-in functions to convert values and to apply different operations on specific value types in addition to the operators.

    A function can be invoked by its name followed by the arguments. The arguments can be assigned to the function parameters either by their position or by defining the parameter names.

    floor(1.5)
    // 1
    
    count(["a","b","c"])
    // 3
    
    append(["a","b"], "c")
    // ["a","b","c"]
    
    contains(string: "foobar", match: "foo")
    // true
    

    Additional Resources

    References:

    Message Correlation

    Message correlation describes how a message is correlated to a workflow instance. Messages can be correlated to the following elements:

    Message Subscriptions

    A message is not sent to a workflow instance directly. Instead, the message correlation is based on subscriptions that contains the message name and the correlation key (aka correlation value).

    Message Correlation

    A subscription is opened when a workflow instance awaits a message, for example, when entering a message catch event. The message name is defined either statically in the workflow (e.g. Money collected) or dynamically as an expression. The correlation key is defined dynamically as an expression (e.g. = orderId). The expressions are evaluated on activating the message catch event. The results of the evaluations are used as message name and as correlation key of the subscription (e.g. "order-123").

    When a message is published and the message name and the correlation key matches to a subscription then the message is correlated to the corresponding workflow instance. If no proper subscription is opened then the message is discarded.

    A subscription is closed when the corresponding element (e.g. the message catch event), or its scope is left. After a subscription is opened, it is not updated, for example, when the referenced workflow instance variable is changed.

    Publish message via zbctl

    zbctl publish message "Money collected" --correlationKey "order-123"
    

    Message Cardinality

    A message is correlated only once to a workflow (based on the BPMN process id), across all versions of this workflow. If multiple subscriptions for the same workflow are opened (by multiple workflow instances or within one instance) then the message is correlated only to one of the subscriptions.

    When subscriptions are opened for different workflows then the message is correlated to all of the subscriptions.

    A message is not correlated to a message start event subscription if an instance of the workflow is active and was created by a message with the same correlation key. If the message is buffered then it can be correlated after the active instance is ended. Otherwise, it is discarded.

    Message Buffering

    Messages can be buffered for a given time. Buffering can be useful in a situation when it is not guaranteed that the subscription is opened before the message is published.

    A message has a time-to-live (TTL) which specifies for how long it is buffered. Within this time, the message can be correlated to a workflow instance.

    When a subscription is opened then it polls the buffer for a proper message. If a proper message exists then it is correlated to the corresponding workflow instance. In case multiple messages match to the subscription then the first published message is correlated (like a FIFO queue).

    The buffering of a message is disabled when its TTL is set to zero. If no proper subscription is opened then the message is discarded.

    Publish message with TTL via zbctl

    zbctl publish message "Money collected" --correlationKey "order-123" --ttl 1h
    

    Message Uniqueness

    A message can have a message id - a unique id to ensure that the message is published only once (i.e. idempotency). The id can be any string, for example, a request id, a tracking number or the offset/position in a message queue.

    A message is rejected and not correlated if a message with the same name, the same correlation key and the same id is already buffered. After the message is discarded from the buffer, a message with the same name, correlation key and id can be published again.

    The uniqueness check is disabled when no message id is set.

    Publish message with id via zbctl

    zbctl publish message "Money collected" --correlationKey "order-123" --messageId "tracking-12345"
    

    Message Patterns

    The following patterns describe solutions to common problems what can be solved using the message correlation.

    Message Aggregator

    Problem: aggregate/collect multiple messages, map-reduce, batching

    Solution:

    Message Aggregator

    The messages are published with a TTL > 0 and a correlation key that groups the messages per entity.

    The first message creates a new workflow instance. The following messages are correlated to the same workflow instance if they have the same correlation key.

    When the instance is ended and messages with the same correlation key are not correlated yet then a new workflow instance is created.

    Single Instance

    Problem: create exactly one instance of a workflow

    Solution:

    Message Single Instance

    The message is published with a TTL = 0 and a correlation key that identifies the entity.

    The first message creates a new workflow instance. Following messages are discarded and does not create a new instance if they have the same correlation key and the created workflow instance is still active.

    Incidents

    In Zeebe, an incident represents a problem in a workflow execution. That means a workflow instance is stuck at some point and it needs an user interaction to resolve the problem.

    Incidents are created in different situations, for example, when

    • a job is failed and it has no more retries left
    • an input or output variable mapping can't be applied
    • a condition can't be evaluated

    Note that incidents are not created when an unexpected exception happens at the broker (e.g. NullPointerException, OutOfMemoyError etc.).

    Resolving

    In order to resolve an incident, the user must identify and resolve the problem first. Then, the user marks the incident as resolved and the broker tries to continue the workflow execution. If the problem still exists then a new incident is created.

    Resolving a Job-related Incident

    If a job is failed and it has no more retries left then an incident is created. There can be different reasons why the job is failed, for example, the variables are not in the expected format, or a service is not available (e.g. a database).

    In case that it is caused by the variables, the user needs to update the variables of the workflow instance first. Then, the user needs to increase the remaining retries of the job and mark the incident as resolved.

    Using the Java client, this could look like:

    client.newSetVariablesCommand(incident.getElementInstanceKey())
        .variables(NEW_PAYLOAD)
        .send()
        .join();
    
    client.newUpdateRetriesCommand(incident.getJobKey())
        .retries(3)
        .send()
        .join();
    
    client.newResolveIncidentCommand(incident.getKey())
        .send()
        .join();        
    

    When the incident is resolved then the job can be activated by a worker again.

    Resolving a Workflow Instance-related Incident

    If an incident is created while workflow execution and it is not related to a job, then it is usually related to the variables of the workflow instance. For example, an input or output variable mapping can't be applied.

    To resolve the incident, the user needs to update the variables first and then mark the incident as resolved.

    Using the Java client, this could look like:

    client.newSetVariablesCommand(incident.getElementInstanceKey())
        .variables(NEW_VARIABLES)
        .send()
        .join();
    
    client.newResolveIncidentCommand(incident.getKey())
        .send()
        .join();        
    

    When the incident is resolved then the workflow instance continues.

    gRPC API Reference

    Error handling

    The gRPC API for Zeebe is exposed through the gateway, which acts as a proxy for the broker. Generally, this means that the client executes an remote call on the gateway, which is then translated to special binary protocol that the gateway uses to communicate with the broker.

    As a result of this proxying, any errors which occur between the gateway and the broker for which the client is not at fault (e.g. the gateway cannot deserialize the broker response, the broker is unavailable, etc.) are reported to the client using the following error codes.

    • GRPC_STATUS_RESOURCE_EXHAUSTED: if the broker is receiving too many requests more than what it can handle, it kicks off back-pressure and rejects requests with this error code. In this case, it is possible to retry the requests with an appropriate retry strategy. If you receive many such errors with in a small time period, it indicates that the broker is constantly under high load. It is recommended to reduce the rate of requests. When the back-pressure kicks off, the broker may reject any request except CompleteJob RPC and FailJob RPC. These requests are white-listed for back-pressure and are always accepted by the broker even if it is receiving requests above its limits.
    • GRPC_STATUS_UNAVAILABLE: if the gateway itself is in an invalid state (e.g. out of memory)
    • GRPC_STATUS_INTERNAL: for any other internal errors that occurred between the gateway and the broker.

    This behavior applies to every single possible RPC; in these cases, it is possible that retrying would succeed, but it is recommended to do so with an appropriate retry policy (e.g. a combination of exponential backoff or jitter wrapped in a circuit breaker).

    In the documentation below, the documented errors are business logic errors, meaning errors which are a result of request processing logic, and not serialization, network, or other more general errors.

    As the gRPC server/client is based on generated code, keep in mind that any call made to the server can return errors as described by the spec here.

    Gateway service

    The Zeebe gRPC API is exposed through a single gateway service.

    ActivateJobs RPC

    Iterates through all known partitions round-robin and activates up to the requested maximum and streams them back to the client as they are activated.

    Input: ActivateJobsRequest

    message ActivateJobsRequest {
      // the job type, as defined in the BPMN process (e.g. <zeebe:taskDefinition
      // type="payment-service" />)
      string type = 1;
      // the name of the worker activating the jobs, mostly used for logging purposes
      string worker = 2;
      // a job returned after this call will not be activated by another call until the
      // timeout (in ms) has been reached
      int64 timeout = 3;
      // the maximum jobs to activate by this request
      int32 maxJobsToActivate = 4;
      // a list of variables to fetch as the job variables; if empty, all visible variables at
      // the time of activation for the scope of the job will be returned
      repeated string fetchVariable = 5;
      // The request will be completed when at least one job is activated or after the requestTimeout (in ms).
      // if the requestTimeout = 0, a default timeout is used.
      // if the requestTimeout < 0, long polling is disabled and the request is completed immediately, even when no job is activated.
      int64 requestTimeout = 6;
    }
    

    Output: ActivateJobsResponse

    message ActivateJobsResponse {
      // list of activated jobs
      repeated ActivatedJob jobs = 1;
    }
    
    message ActivatedJob {
      // the key, a unique identifier for the job
      int64 key = 1;
      // the type of the job (should match what was requested)
      string type = 2;
      // the job's workflow instance key
      int64 workflowInstanceKey = 3;
      // the bpmn process ID of the job workflow definition
      string bpmnProcessId = 4;
      // the version of the job workflow definition
      int32 workflowDefinitionVersion = 5;
      // the key of the job workflow definition
      int64 workflowKey = 6;
      // the associated task element ID
      string elementId = 7;
      // the unique key identifying the associated task, unique within the scope of the
      // workflow instance
      int64 elementInstanceKey = 8;
      // a set of custom headers defined during modelling; returned as a serialized
      // JSON document
      string customHeaders = 9;
      // the name of the worker which activated this job
      string worker = 10;
      // the amount of retries left to this job (should always be positive)
      int32 retries = 11;
      // when the job can be activated again, sent as a UNIX epoch timestamp
      int64 deadline = 12;
      // JSON document, computed at activation time, consisting of all visible variables to
      // the task scope
      string variables = 13;
    }
    

    Errors

    GRPC_STATUS_INVALID_ARGUMENT

    Returned if:

    • type is blank (empty string, null)
    • worker is blank (empty string, null)
    • timeout less than 1 (ms)
    • amount is less than 1

    CancelWorkflowInstance RPC

    Cancels a running workflow instance

    Input: CancelWorkflowInstanceRequest

    message CancelWorkflowInstanceRequest {
      // the workflow instance key (as, for example, obtained from
      // CreateWorkflowInstanceResponse)
      int64 workflowInstanceKey = 1;
    }
    

    Output: CancelWorkflowInstanceResponse

    message CancelWorkflowInstanceResponse {
    }
    

    Errors

    GRPC_STATUS_NOT_FOUND

    Returned if:

    • no workflow instance exists with the given key. Note that since workflow instances are removed once their are finished, it could mean the instance did exist at some point.

    CompleteJob RPC

    Completes a job with the given payload, which allows completing the associated service task.

    Input: CompleteJobRequest

    message CompleteJobRequest {
      // the unique job identifier, as obtained from ActivateJobsResponse
      int64 jobKey = 1;
      // a JSON document representing the variables in the current task scope
      string variables = 2;
    }
    

    Output: CompleteJobResponse

    message CompleteJobResponse {
    }
    

    Errors

    GRPC_STATUS_NOT_FOUND

    Returned if:

    • no job exists with the given job key. Note that since jobs are removed once completed, it could be that this job did exist at some point.
    GRPC_STATUS_FAILED_PRECONDITION

    Returned if:

    • the job was marked as failed. In that case, the related incident must be resolved before the job can be activated again and completed.

    CreateWorkflowInstance RPC

    Creates and starts an instance of the specified workflow. The workflow definition to use to create the instance can be specified either using its unique key (as returned by DeployWorkflow), or using the BPMN process ID and a version. Pass -1 as the version to use the latest deployed version.

    Note that only workflows with none start events can be started through this command.

    Input: CreateWorkflowInstanceRequest

    message CreateWorkflowInstanceRequest {
      // the unique key identifying the workflow definition (e.g. returned from a workflow
      // in the DeployWorkflowResponse message)
      int64 workflowKey = 1;
      // the BPMN process ID of the workflow definition
      string bpmnProcessId = 2;
      // the version of the process; set to -1 to use the latest version
      int32 version = 3;
      // JSON document that will instantiate the variables for the root variable scope of the
      // workflow instance; it must be a JSON object, as variables will be mapped in a
      // key-value fashion. e.g. { "a": 1, "b": 2 } will create two variables, named "a" and
      // "b" respectively, with their associated values. [{ "a": 1, "b": 2 }] would not be a
      // valid argument, as the root of the JSON document is an array and not an object.
      string variables = 4;
    }
    

    Output: CreateWorkflowInstanceResponse

    message CreateWorkflowInstanceResponse {
      // the key of the workflow definition which was used to create the workflow instance
      int64 workflowKey = 1;
      // the BPMN process ID of the workflow definition which was used to create the workflow
      // instance
      string bpmnProcessId = 2;
      // the version of the workflow definition which was used to create the workflow instance
      int32 version = 3;
      // the unique identifier of the created workflow instance; to be used wherever a request
      // needs a workflow instance key (e.g. CancelWorkflowInstanceRequest)
      int64 workflowInstanceKey = 4;
    }
    

    CreateWorkflowInstanceWithResult RPC

    Similar to CreateWorkflowInstance RPC , creates and starts an instance of the specified workflow. Unlike CreateWorkflowInstance RPC, the response is returned when the workflow is completed.

    Note that only workflows with none start events can be started through this command.

    Input: CreateWorkflowInstanceWithResultRequest

    message CreateWorkflowInstanceRequest {
       CreateWorkflowInstanceRequest request = 1;
       // timeout (in ms). the request will be closed if the workflow is not completed before
       // the requestTimeout.
       // if requestTimeout = 0, uses the generic requestTimeout configured in the gateway.
       int64 requestTimeout = 2;
    }
    

    Output: CreateWorkflowInstanceWithResultResponse

    message CreateWorkflowInstanceResponse {
      // the key of the workflow definition which was used to create the workflow instance
      int64 workflowKey = 1;
      // the BPMN process ID of the workflow definition which was used to create the workflow
      // instance
      string bpmnProcessId = 2;
      // the version of the workflow definition which was used to create the workflow instance
      int32 version = 3;
      // the unique identifier of the created workflow instance; to be used wherever a request
      // needs a workflow instance key (e.g. CancelWorkflowInstanceRequest)
      int64 workflowInstanceKey = 4;
      // consisting of all visible variables to the root scope
      string variables = 5;
    }
    

    Errors

    GRPC_STATUS_NOT_FOUND

    Returned if:

    • no workflow with the given key exists (if workflowKey was given)
    • no workflow with the given process ID exists (if bpmnProcessId was given but version was -1)
    • no workflow with the given process ID and version exists (if both bpmnProcessId and version were given)
    GRPC_STATUS_FAILED_PRECONDITION

    Returned if:

    • the workflow definition does not contain a none start event; only workflows with none start event can be started manually.
    GRPC_STATUS_INVALID_ARGUMENT

    Returned if:

    • the given variables argument is not a valid JSON document; it is expected to be a valid JSON document where the root node is an object.

    DeployWorkflow RPC

    Deploys one or more workflows to Zeebe. Note that this is an atomic call, i.e. either all workflows are deployed, or none of them are.

    Input: DeployWorkflowRequest

    message DeployWorkflowRequest {
      // List of workflow resources to deploy
      repeated WorkflowRequestObject workflows = 1;
    }
    
    message WorkflowRequestObject {
      enum ResourceType {
        // FILE type means the gateway will try to detect the resource type
        // using the file extension of the name field
        FILE = 0;
        BPMN = 1; // extension 'bpmn'
        YAML = 2; // extension 'yaml'
      }
    
      // the resource basename, e.g. myProcess.bpmn
      string name = 1;
      // the resource type; if set to BPMN or YAML then the file extension
      // is ignored
      ResourceType type = 2;
      // the process definition as a UTF8-encoded string
      bytes definition = 3;
    }
    

    Output: DeployWorkflowResponse

    message DeployWorkflowResponse {
      // the unique key identifying the deployment
      int64 key = 1;
      // a list of deployed workflows
      repeated WorkflowMetadata workflows = 2;
    }
    
    message WorkflowMetadata {
      // the bpmn process ID, as parsed during deployment; together with the version forms a
      // unique identifier for a specific workflow definition
      string bpmnProcessId = 1;
      // the assigned process version
      int32 version = 2;
      // the assigned key, which acts as a unique identifier for this workflow
      int64 workflowKey = 3;
      // the resource name (see: WorkflowRequestObject.name) from which this workflow was
      // parsed
      string resourceName = 4;
    }
    

    Errors

    GRPC_STATUS_INVALID_ARGUMENT

    Returned if:

    • no resources given.
    • if at least one resource is invalid. A resource is considered invalid if:
      • it is not a BPMN or YAML file (currently detected through the file extension)
      • the resource data is not deserializable (e.g. detected as BPMN, but it's broken XML)
      • the workflow is invalid (e.g. an event-based gateway has an outgoing sequence flow to a task)

    FailJob RPC

    Marks the job as failed; if the retries argument is positive, then the job will be immediately activatable again, and a worker could try again to process it. If it is zero or negative however, an incident will be raised, tagged with the given errorMessage, and the job will not be activatable until the incident is resolved.

    Input: FailJobRequest

    message FailJobRequest {
      // the unique job identifier, as obtained when activating the job
      int64 jobKey = 1;
      // the amount of retries the job should have left
      int32 retries = 2;
      // an optional message describing why the job failed
      // this is particularly useful if a job runs out of retries and an incident is raised,
      // as it this message can help explain why an incident was raised
      string errorMessage = 3;
    }
    

    Output: FailJobResponse

    message FailJobResponse {
    }
    

    Errors

    GRPC_STATUS_NOT_FOUND

    Returned if:

    • no job was found with the given key
    GRPC_STATUS_FAILED_PRECONDITION

    Returned if:

    • the job was not activated
    • the job is already in a failed state, i.e. ran out of retries

    PublishMessage RPC

    Publishes a single message. Messages are published to specific partitions computed from their correlation keys.

    Input: Request

    message PublishMessageRequest {
      // the name of the message
      string name = 1;
      // the correlation key of the message
      string correlationKey = 2;
      // how long the message should be buffered on the broker, in milliseconds
      int64 timeToLive = 3;
      // the unique ID of the message; can be omitted. only useful to ensure only one message
      // with the given ID will ever be published (during its lifetime)
      string messageId = 4;
      // the message variables as a JSON document; to be valid, the root of the document must be an
      // object, e.g. { "a": "foo" }. [ "foo" ] would not be valid.
      string variables = 5;
    }
    

    Output: Response

    message PublishMessageResponse {
      // the unique ID of the message that was published
      int64 key = 1;
    }
    

    Errors

    GRPC_STATUS_ALREADY_EXISTS

    Returned if:

    • a message with the same ID was previously published (and is still alive)

    ResolveIncident RPC

    Resolves a given incident. This simply marks the incident as resolved; most likely a call to UpdateJobRetries or UpdateWorkflowInstancePayload will be necessary to actually resolve the problem, following by this call.

    Input: Request

    message ResolveIncidentRequest {
      // the unique ID of the incident to resolve
      int64 incidentKey = 1;
    }
    

    Output: Response

    message ResolveIncidentResponse {
    }
    

    Errors

    GRPC_STATUS_NOT_FOUND

    Returned if:

    • no incident with the given key exists

    SetVariables RPC

    Updates all the variables of a particular scope (e.g. workflow instance, flow element instance) from the given JSON document.

    Input: Request

    message SetVariablesRequest {
      // the unique identifier of a particular element; can be the workflow instance key (as
      // obtained during instance creation), or a given element, such as a service task (see
      // elementInstanceKey on the job message)
      int64 elementInstanceKey = 1;
      // a JSON serialized document describing variables as key value pairs; the root of the document
      // must be an object
      string variables = 2;
      // if true, the variables will be merged strictly into the local scope (as indicated by
      // elementInstanceKey); this means the variables is not propagated to upper scopes.
      // for example, let's say we have two scopes, '1' and '2', with each having effective variables as:
      // 1 => `{ "foo" : 2 }`, and 2 => `{ "bar" : 1 }`. if we send an update request with
      // elementInstanceKey = 2, variables `{ "foo" : 5 }`, and local is true, then scope 1 will
      // be unchanged, and scope 2 will now be `{ "bar" : 1, "foo" 5 }`. if local was false, however,
      // then scope 1 would be `{ "foo": 5 }`, and scope 2 would be `{ "bar" : 1 }`.
      bool local = 3;
    }
    

    Output: Response

    message SetVariablesResponse {
      // the unique key of the set variables command
      int64 key = 1;
    }
    

    Errors

    GRPC_STATUS_NOT_FOUND

    Returned if:

    • no element with the given elementInstanceKey was exists
    GRPC_STATUS_INVALID_ARGUMENT

    Returned if:

    • the given payload is not a valid JSON document; all payloads are expected to be valid JSON documents where the root node is an object.

    ThrowError RPC

    Throw an error to indicate that a business error is occurred while processing the job. The error is identified by an error code and is handled by an error catch event in the workflow with the same error code.

    Input: ThrowErrorRequest

    message ThrowErrorRequest {
      // the unique job identifier, as obtained when activating the job
      int64 jobKey = 1;
      // the error code that will be matched with an error catch event
      string errorCode = 2;
      // an optional error message that provides additional context
      string errorMessage = 3;
    }
    

    Output: ThrowErrorResponse

    message ThrowErrorResponse {
    }
    

    Errors

    GRPC_STATUS_NOT_FOUND

    Returned if:

    • no job was found with the given key
    GRPC_STATUS_FAILED_PRECONDITION

    Returned if:

    • the job is already in a failed state, i.e. ran out of retries

    Topology RPC

    Obtains the current topology of the cluster the gateway is part of.

    Input: TopologyRequest

    message TopologyRequest {
    }
    

    Output: TopologyResponse

    message TopologyResponse {
      // list of brokers part of this cluster
      repeated BrokerInfo brokers = 1;
      // how many nodes are in the cluster
      int32 clusterSize = 2;
      // how many partitions are spread across the cluster
      int32 partitionsCount = 3;
      // configured replication factor for this cluster
      int32 replicationFactor = 4;
      // gateway version
      string gatewayVersion = 5;
    }
    
    message BrokerInfo {
      // unique (within a cluster) node ID for the broker
      int32 nodeId = 1;
      // hostname of the broker
      string host = 2;
      // port for the broker
      int32 port = 3;
      // list of partitions managed or replicated on this broker
      repeated Partition partitions = 4;
      // broker version
      string version = 5;
    }
    
    message Partition {
      // Describes the Raft role of the broker for a given partition
      enum PartitionBrokerRole {
        LEADER = 0;
        FOLLOWER = 1;
      }
    
      // the unique ID of this partition
      int32 partitionId = 1;
      // the role of the broker for this partition
      PartitionBrokerRole role = 2;
    }
    

    Errors

    No specific errors

    UpdateJobRetries RPC

    Updates the number of retries a job has left. This is mostly useful for jobs that have run out of retries, should the underlying problem be solved.

    Input: Request

    message UpdateJobRetriesRequest {
      // the unique job identifier, as obtained through ActivateJobs
      int64 jobKey = 1;
      // the new amount of retries for the job; must be positive
      int32 retries = 2;
    }
    

    Output: Response

    message UpdateJobRetriesResponse {
    }
    

    Errors

    GRPC_STATUS_NOT_FOUND

    Returned if:

    • no job exists with the given key
    GRPC_STATUS_INVALID_ARGUMENT

    Returned if:

    • retries is not greater than 0

    Exporters

    Regardless of how an exporter is loaded (whether through an external JAR or not), all exporters interact in the same way with the broker, which is defined by the Exporter interface.

    Loading

    Once configured, exporters are loaded as part of the broker startup phase, before any processing is done.

    During the loading phase, the configuration for each exporter is validated, such that the broker will not start if:

    • An exporter ID is not unique
    • An exporter points to a non-existent/non-accessible JAR
    • An exporter points to a non-existent/non-instantiable class
    • An exporter instance throws an exception in its Exporter#configure method.

    The last point is there to provide individual exporters to perform lightweight validation of their configuration (e.g. fail if missing arguments).

    One of the caveat of the last point is that an instance of an exporter is created and immediately thrown away; therefore, exporters should not perform any computationally heavy work during instantiation/configuration.

    Note: Zeebe will create an isolated class loader for every JAR referenced by exporter configurations - that is, only once per JAR; if the same JAR is reused to define different exporters, then these will share the same class loader.

    This has some nice properties, primarily that different exporters can depend on the same third party libraries without having to worry about versions, or class name collisions.

    Additionally, exporters use the system class loader for system classes, or classes packaged as part of the Zeebe JAR.

    Exporter specific configuration is handled through the exporter's [exporters.args] nested map. This provides a simple Map<String, Object> which is passed directly in form of a Configuration object when the broker calls the Exporter#configure(Configuration) method.

    Configuration occurs at two different phases: during the broker startup phase, and once every time a leader is elected for a partition.

    Processing

    At any given point, there is exactly one leader node for a given partition. Whenever a node becomes the leader for a partition, one of the things it will do is run an instance of an exporter stream processor.

    This stream processor will create exactly one instance of each configured exporter, and forward every record written on the stream to each of these in turn.

    Note: this implies that there will be exactly one instance of every exporter for every partition: if you have 4 partitions, and at least 4 threads for processing, then there are potentially 4 instances of your exporter exporting simultaneously.

    Note that Zeebe only guarantees at-least-once semantics, that is, a record will be seen at least once by an exporter, and maybe more. Cases where this may happen include:

    • During reprocessing after raft failover (i.e. new leader election)
    • On error if the position has not been updated yet

    To reduce the amount of duplicate records an exporter will process, the stream processor will keep track of the position of the last successfully exported record for every single exporter; the position is sufficient since a stream is an ordered sequence of records whose position is monotonically increasing. This position is set by the exporter itself once it can guarantee a record has been successfully updated.

    Note: although Zeebe tries to reduce the amount of duplicate records an exporter has to handle, it is likely that it will have to; therefore, it is necessary that export operations be idempotent.

    This can be implemented either in the exporter itself, but if it exports to an external system, it is recommended that you perform deduplication there to reduce the load on Zeebe itself. Refer to the exporter specific documentation for how this is meant to be achieved.

    Error handling

    If an error occurs during the Exporter#open(Context) phase, the stream processor will fail and be restarted, potentially fixing the error; worst case scenario, this means no exporter is running at all until these errors stop.

    If an error occurs during the Exporter#close phase, it will be logged, but will still allow other exporters to gracefully finish their work.

    If an error occurs during processing, we will retry infinitely the same record until no error is produced. Worst case scenario, this means a failing exporter could bring all exporters to a halt. Currently, exporter implementations are expected to implement their own retry/error handling strategies, though this may change in the future.

    Performance impact

    Zeebe naturally incurs a performance impact for each loaded exporter. A slow exporter will slow down all other exporters for a given partition, and, in the worst case, could completely block a thread.

    It's therefore recommended to keep exporters as simple as possible, and perform any data enrichment or transformation through the external system.

    Zeebe Client Libraries

    Applications that leverage the Zeebe broker need to be written using a client library that implements the Zeebe gRPC API.

    Zeebe provides officially supported Java and Go clients, and there are many excellent community-supported clients available for other programming languages.

    Zeebe Java Client

    Setting up the Zeebe Java Client

    Prerequisites

    • Java 8

    Usage in a Maven project

    To use the Java client library, declare the following Maven dependency in your project:

    <dependency>
      <groupId>io.zeebe</groupId>
      <artifactId>zeebe-client-java</artifactId>
      <version>${zeebe.version}</version>
    </dependency>
    

    The version of the client should always match the broker's version.

    Bootstrapping

    In Java code, instantiate the client as follows:

    ZeebeClient client = ZeebeClient.newClientBuilder()
      .gatewayAddress("127.0.0.1:26500")
      .usePlaintext()
      .build();
    

    See the class io.zeebe.client.ZeebeClientBuilder for a description of all available configuration properties.

    Get Started with the Java client

    In this tutorial, you will learn to use the Java client in a Java application to interact with Zeebe.

    You will be guided through the following steps:

    You can find the complete source code, including the BPMN diagrams, on GitHub.

    You can watch a video walk-through of this guide on the Zeebe YouTube channel here.

    Prerequisites

    One of the following:

    (Using Docker)

    (Not using Docker)

    Start the broker

    Before you begin to setup your project, please start the broker.

    If you are using Docker with zeebe-docker-compose, then change into the simple-monitor subdirectory, and run docker-compose up.

    If you are not using Docker, run the start up script bin/broker or bin/broker.bat in the distribution.

    By default, the broker binds to localhost:26500, which is used as contact point in this guide.

    Set up a project

    First, we need a Maven project. Create a new project using your IDE, or run the Maven command:

    mvn archetype:generate \
        -DgroupId=io.zeebe \
        -DartifactId=zeebe-get-started-java-client \
        -DarchetypeArtifactId=maven-archetype-quickstart \
        -DinteractiveMode=false
    

    Add the Zeebe client library as dependency to the project's pom.xml:

    <dependency>
      <groupId>io.zeebe</groupId>
      <artifactId>zeebe-client-java</artifactId>
      <version>${zeebe.version}</version>
    </dependency>
    

    Create a main class and add the following lines to bootstrap the Zeebe client:

    package io.zeebe;
    
    import io.zeebe.client.ZeebeClient;
    
    public class App
    {
        public static void main(final String[] args)
        {
            final ZeebeClient client = ZeebeClient.newClientBuilder()
                // change the contact point if needed
                .gatewayAddress("127.0.0.1:26500")
                .usePlaintext()
                .build();
    
            System.out.println("Connected.");
    
            // ...
    
            client.close();
            System.out.println("Closed.");
        }
    }
    

    Run the program:

    • If you use an IDE, you can just execute the main class, using your IDE.
    • Otherwise, you must build an executable JAR file with Maven and execute it.

    Interlude: Build an executable JAR file

    Add the Maven Shade plugin to your pom.xml:

    <!-- Maven Shade Plugin -->
    <plugin>
      <groupId>org.apache.maven.plugins</groupId>
      <artifactId>maven-shade-plugin</artifactId>
      <version>2.3</version>
      <executions>
        <!-- Run shade goal on package phase -->
        <execution>
          <phase>package</phase>
          <goals>
            <goal>shade</goal>
          </goals>
          <configuration>
            <transformers>
              <!-- add Main-Class to manifest file -->
              <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                <mainClass>io.zeebe.App</mainClass>
              </transformer>
            </transformers>
          </configuration>
        </execution>
      </executions>
    </plugin>
    

    Now run mvn package, and it will generate a JAR file in the target subdirectory. You can run this with java -jar target/${JAR file}.

    Output of executing program

    You should see the output:

    Connected.
    
    Closed.
    

    Model a workflow

    Now, we need a first workflow which can then be deployed. Later, we will extend the workflow with more functionality.

    Open the Zeebe Modeler and create a new BPMN diagram. Add a start event and an end event to the diagram and connect the events.

    model-workflow-step-1

    Set the id (the BPMN process id), and mark the diagram as executable.

    Save the diagram as src/main/resources/order-process.bpmn under the project's folder.

    Deploy a workflow

    Next, we want to deploy the modeled workflow to the broker.

    The broker stores the workflow under its BPMN process id and assigns a version.

    Add the following deploy command to the main class:

    package io.zeebe;
    
    import io.zeebe.client.api.response.DeploymentEvent;
    
    public class Application
    {
        public static void main(final String[] args)
        {
            // after the client is connected
    
            final DeploymentEvent deployment = client.newDeployCommand()
                .addResourceFromClasspath("order-process.bpmn")
                .send()
                .join();
    
            final int version = deployment.getWorkflows().get(0).getVersion();
            System.out.println("Workflow deployed. Version: " + version);
    
            // ...
        }
    }
    

    Run the program and verify that the workflow is deployed successfully. You should see the output:

    Workflow deployed. Version: 1
    

    Create a workflow instance

    Finally, we are ready to create a first instance of the deployed workflow. A workflow instance is created of a specific version of the workflow, which can be set on creation.

    Add the following create command to the main class:

    package io.zeebe;
    
    import io.zeebe.client.api.response.WorkflowInstanceEvent;
    
    public class Application
    {
        public static void main(final String[] args)
        {
            // after the workflow is deployed
    
            final WorkflowInstanceEvent wfInstance = client.newCreateInstanceCommand()
                .bpmnProcessId("order-process")
                .latestVersion()
                .send()
                .join();
    
            final long workflowInstanceKey = wfInstance.getWorkflowInstanceKey();
    
            System.out.println("Workflow instance created. Key: " + workflowInstanceKey);
    
            // ...
        }
    }
    

    Run the program and verify that the workflow instance is created. You should see the output:

    Workflow instance created. Key: 2113425532
    

    You did it! You want to see how the workflow instance is executed?

    If you are running with Docker, just open http://localhost:8082 in your browser.

    If you are running without Docker:

    • Start the Zeebe Monitor using java -jar zeebe-simple-monitor-app-*.jar.
    • Open a web browser and go to http://localhost:8080/.

    In the Simple Monitor interface, you see the current state of the workflow instance. zeebe-monitor-step-1

    Work on a job

    Now we want to do some work within your workflow. First, add a few service jobs to the BPMN diagram and set the required attributes. Then extend your main class and create a job worker to process jobs which are created when the workflow instance reaches a service task.

    Open the BPMN diagram in the Zeebe Modeler. Insert a few service tasks between the start and the end event.

    model-workflow-step-2

    You need to set the type of each task, which identifies the nature of the work to be performed. Set the type of the first task to 'payment-service'.

    Set the type of the second task to 'fetcher-service'.

    Set the type of the third task to 'shipping-service'.

    Save the BPMN diagram and switch back to the main class.

    Add the following lines to create a job worker for the first jobs type:

    package io.zeebe;
    
    import io.zeebe.client.api.worker.JobWorker;
    
    public class App
    {
        public static void main(final String[] args)
        {
            // after the workflow instance is created
    
            try(final JobWorker jobWorker = client.newWorker()) {
                jobWorker.jobType("payment-service")
                .handler((jobClient, job) ->
                {
                    System.out.println("Collect money");
    
                    // ...
    
                    jobClient.newCompleteCommand(job.getKey())
                        .send()
                        .join();
                })
                .open();
    
                // waiting for the jobs
                // Don't close, we need to keep polling to get work
                // It will be close after last statement in try-with resources block
    
                // ...
            }
    
        }
    }
    

    Run the program and verify that the job is processed. You should see the output:

    Collect money
    

    When you have a look at the Zeebe Monitor, then you can see that the workflow instance moved from the first service task to the next one:

    zeebe-monitor-step-2

    Work with data

    Usually, a workflow is more than just tasks, there is also a data flow. The worker gets the data from the workflow instance to do its work and send the result back to the workflow instance.

    In Zeebe, the data is stored as key-value-pairs in form of variables. Variables can be set when the workflow instance is created. Within the workflow, variables can be read and modified by workers.

    In our example, we want to create a workflow instance with the following variables:

    "orderId": 31243
    "orderItems": [435, 182, 376]
    

    The first task should read orderId as input and return totalPrice as result.

    Modify the workflow instance create command and pass the data as variables. Also, modify the job worker to read the job variables and complete the job with a result.

    package io.zeebe;
    
    public class App
    {
        public static void main(final String[] args)
        {
            // after the workflow is deployed
    
            final Map<String, Object> data = new HashMap<>();
            data.put("orderId", 31243);
            data.put("orderItems", Arrays.asList(435, 182, 376));
    
            final WorkflowInstanceEvent wfInstance = client.newCreateInstanceCommand()
                .bpmnProcessId("order-process")
                .latestVersion()
                .variables(data)
                .send()
                .join();
    
            // ...
    
            final JobWorker jobWorker = client.newWorker()
                .jobType("payment-service")
                .handler((jobClient, job) ->
                {
                    final Map<String, Object> variables = job.getVariablesAsMap();
    
                    System.out.println("Process order: " + variables.get("orderId"));
                    double price = 46.50;
                    System.out.println("Collect money: $" + price);
    
                    // ...
    
                    final Map<String, Object> result = new HashMap<>();
                    result.put("totalPrice", price);
    
                    jobClient.newCompleteCommand(job.getKey())
                        .variables(result)
                        .send()
                        .join();
                })
                .fetchVariables("orderId")
                .open();
    
            // ...
        }
    }
    

    Run the program and verify that the variable is read. You should see the output:

    Process order: 31243
    Collect money: $46.50
    

    When we have a look at the Zeebe Monitor, then we can see that the variable totalPrice is set:

    zeebe-monitor-step-3

    What's next?

    Hurray! You finished this tutorial and learned the basic usage of the Java client.

    Next steps:

    Logging

    The client uses SLF4J for logging. It logs useful things, such as exception stack traces when a job handler fails execution. Using the SLF4J API, any SLF4J implementation can be plugged in. The following example uses Log4J 2.

    Maven dependencies

    <dependency>
      <groupId>org.apache.logging.log4j</groupId>
      <artifactId>log4j-slf4j-impl</artifactId>
      <version>2.8.1</version>
    </dependency>
    
    <dependency>
      <groupId>org.apache.logging.log4j</groupId>
      <artifactId>log4j-core</artifactId>
      <version>2.8.1</version>
    </dependency>
    

    Configuration

    Add a file called log4j2.xml to the classpath of your application. Add the following content:

    <?xml version="1.0" encoding="UTF-8"?>
    <Configuration status="WARN" strict="true"
        xmlns="http://logging.apache.org/log4j/2.0/config"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://logging.apache.org/log4j/2.0/config https://raw.githubusercontent.com/apache/logging-log4j2/log4j-2.8.1/log4j-core/src/main/resources/Log4j-config.xsd">
      <Appenders>
        <Console name="Console" target="SYSTEM_OUT">
          <PatternLayout pattern="%d{HH:mm:ss.SSS} [%t] %-5level Java Client: %logger{36} - %msg%n"/>
        </Console>
      </Appenders>
      <Loggers>
        <Root level="info">
          <AppenderRef ref="Console"/>
        </Root>
      </Loggers>
    </Configuration>
    

    This will log every log message to the console.

    Writing Tests

    You can use the zeebe-test module to write JUnit tests for your job worker and BPMN workflow. It provides a JUnit rule to bootstrap the broker and some basic assertions.

    Usage in a Maven project

    Add zeebe-test as Maven test dependency to your project:

    <dependency>
      <groupId>io.zeebe</groupId>
      <artifactId>zeebe-test</artifactId>
      <scope>test</scope>
    </dependency>
    

    Bootstrap the Broker

    Use the ZeebeTestRule in your test case to start an embedded broker. It contains a client which can be used to deploy a BPMN workflow or create an instance.

    import io.zeebe.client.ZeebeClient;
    import io.zeebe.client.api.response.WorkflowInstanceEvent;
    import org.junit.Before;
    import org.junit.Rule;
    import org.junit.Test;
    
    public class MyTest {
    
      @Rule public final ZeebeTestRule testRule = new ZeebeTestRule();
    
      private ZeebeClient client;
    
      @Test
      public void test() {
      	client = testRule.getClient();
    
        client
            .newDeployCommand()
            .addResourceFromClasspath("process.bpmn")
            .send()
            .join();
    
        final WorkflowInstanceEvent workflowInstance =
            client
                .newCreateInstanceCommand()
                .bpmnProcessId("process")
                .latestVersion()
                .send()
                .join();
      }
    }
    

    Verify the Result

    The ZeebeTestRule provides also some basic assertions in AssertJ style. The entry point of the assertions is ZeebeTestRule.assertThat(...).

    final WorkflowInstanceEvent workflowInstance = ...
    
    ZeebeTestRule.assertThat(workflowInstance)
        .isEnded()
        .hasPassed("start", "task", "end")
        .hasVariable("result", 21.0);
    

    Example Code using the Zeebe Java Client

    These examples are accessible in the zeebe-io github repository at commit ea36b4fab5c4883d385e32dd9ead6e01dcfdbf26. Link to browse code on github.

    Instructions to access code locally:

    git clone https://github.com/zeebe-io/zeebe.git
    git checkout ea36b4fab5c4883d385e32dd9ead6e01dcfdbf26
    cd zeebe/samples
    

    Import the Maven project in the samples directory into your IDE to start hacking.

    Workflow

    Job

    Data

    Cluster

    Deploy a Workflow

    Related Resources

    Prerequisites

    1. Running Zeebe broker with endpoint localhost:26500 (default)

    WorkflowDeployer.java

    Source on github

    /*
     * Copyright Camunda Services GmbH and/or licensed to Camunda Services GmbH under
     * one or more contributor license agreements. See the NOTICE file distributed
     * with this work for additional information regarding copyright ownership.
     * Licensed under the Zeebe Community License 1.0. You may not use this file
     * except in compliance with the Zeebe Community License 1.0.
     */
    package io.zeebe.example.workflow;
    
    import io.zeebe.client.ZeebeClient;
    import io.zeebe.client.ZeebeClientBuilder;
    import io.zeebe.client.api.response.DeploymentEvent;
    
    public final class WorkflowDeployer {
    
      public static void main(final String[] args) {
        final String broker = "localhost:26500";
    
        final ZeebeClientBuilder clientBuilder =
            ZeebeClient.newClientBuilder().brokerContactPoint(broker).usePlaintext();
    
        try (final ZeebeClient client = clientBuilder.build()) {
    
          final DeploymentEvent deploymentEvent =
              client.newDeployCommand().addResourceFromClasspath("demoProcess.bpmn").send().join();
    
          System.out.println("Deployment created with key: " + deploymentEvent.getKey());
        }
      }
    }
    

    demoProcess.bpmn

    Source on github

    Download the XML and save it in the Java classpath before running the example. Open the file with Zeebe Modeler for a graphical representation.

    <?xml version="1.0" encoding="UTF-8"?>
    <bpmn:definitions xmlns:bpmn="http://www.omg.org/spec/BPMN/20100524/MODEL" xmlns:bpmndi="http://www.omg.org/spec/BPMN/20100524/DI" xmlns:di="http://www.omg.org/spec/DD/20100524/DI" xmlns:dc="http://www.omg.org/spec/DD/20100524/DC" xmlns:camunda="http://camunda.org/schema/1.0/bpmn" xmlns:zeebe="http://camunda.org/schema/zeebe/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" id="Definitions_1" targetNamespace="http://bpmn.io/schema/bpmn" exporter="Camunda Modeler" exporterVersion="1.5.0-nightly">
      <bpmn:process id="demoProcess" isExecutable="true">
        <bpmn:startEvent id="start" name="start">
          <bpmn:outgoing>SequenceFlow_1sz6737</bpmn:outgoing>
        </bpmn:startEvent>
        <bpmn:sequenceFlow id="SequenceFlow_1sz6737" sourceRef="start" targetRef="taskA" />
        <bpmn:sequenceFlow id="SequenceFlow_06ytcxw" sourceRef="taskA" targetRef="taskB" />
        <bpmn:sequenceFlow id="SequenceFlow_1oh45y7" sourceRef="taskB" targetRef="taskC" />
        <bpmn:endEvent id="end" name="end">
          <bpmn:incoming>SequenceFlow_148rk2p</bpmn:incoming>
        </bpmn:endEvent>
        <bpmn:sequenceFlow id="SequenceFlow_148rk2p" sourceRef="taskC" targetRef="end" />
        <bpmn:serviceTask id="taskA" name="task A">
          <bpmn:extensionElements>
            <zeebe:taskDefinition type="foo" />
          </bpmn:extensionElements>
          <bpmn:incoming>SequenceFlow_1sz6737</bpmn:incoming>
          <bpmn:outgoing>SequenceFlow_06ytcxw</bpmn:outgoing>
        </bpmn:serviceTask>
        <bpmn:serviceTask id="taskB" name="task B">
          <bpmn:extensionElements>
            <zeebe:taskDefinition type="bar" />
          </bpmn:extensionElements>
          <bpmn:incoming>SequenceFlow_06ytcxw</bpmn:incoming>
          <bpmn:outgoing>SequenceFlow_1oh45y7</bpmn:outgoing>
        </bpmn:serviceTask>
        <bpmn:serviceTask id="taskC" name="task C">
          <bpmn:extensionElements>
            <zeebe:taskDefinition type="foo" />
          </bpmn:extensionElements>
          <bpmn:incoming>SequenceFlow_1oh45y7</bpmn:incoming>
          <bpmn:outgoing>SequenceFlow_148rk2p</bpmn:outgoing>
        </bpmn:serviceTask>
      </bpmn:process>
      <bpmndi:BPMNDiagram id="BPMNDiagram_1">
        <bpmndi:BPMNPlane id="BPMNPlane_1" bpmnElement="demoProcess">
          <bpmndi:BPMNShape id="_BPMNShape_StartEvent_2" bpmnElement="start">
            <dc:Bounds x="173" y="102" width="36" height="36" />
            <bpmndi:BPMNLabel>
              <dc:Bounds x="180" y="138" width="22" height="12" />
            </bpmndi:BPMNLabel>
          </bpmndi:BPMNShape>
          <bpmndi:BPMNEdge id="SequenceFlow_1sz6737_di" bpmnElement="SequenceFlow_1sz6737">
            <di:waypoint xsi:type="dc:Point" x="209" y="120" />
            <di:waypoint xsi:type="dc:Point" x="310" y="120" />
            <bpmndi:BPMNLabel>
              <dc:Bounds x="260" y="105" width="0" height="0" />
            </bpmndi:BPMNLabel>
          </bpmndi:BPMNEdge>
          <bpmndi:BPMNEdge id="SequenceFlow_06ytcxw_di" bpmnElement="SequenceFlow_06ytcxw">
            <di:waypoint xsi:type="dc:Point" x="410" y="120" />
            <di:waypoint xsi:type="dc:Point" x="502" y="120" />
            <bpmndi:BPMNLabel>
              <dc:Bounds x="456" y="105" width="0" height="0" />
            </bpmndi:BPMNLabel>
          </bpmndi:BPMNEdge>
          <bpmndi:BPMNEdge id="SequenceFlow_1oh45y7_di" bpmnElement="SequenceFlow_1oh45y7">
            <di:waypoint xsi:type="dc:Point" x="602" y="120" />
            <di:waypoint xsi:type="dc:Point" x="694" y="120" />
            <bpmndi:BPMNLabel>
              <dc:Bounds x="648" y="105" width="0" height="0" />
            </bpmndi:BPMNLabel>
          </bpmndi:BPMNEdge>
          <bpmndi:BPMNShape id="EndEvent_0gbv3sc_di" bpmnElement="end">
            <dc:Bounds x="867" y="102" width="36" height="36" />
            <bpmndi:BPMNLabel>
              <dc:Bounds x="876" y="138" width="18" height="12" />
            </bpmndi:BPMNLabel>
          </bpmndi:BPMNShape>
          <bpmndi:BPMNEdge id="SequenceFlow_148rk2p_di" bpmnElement="SequenceFlow_148rk2p">
            <di:waypoint xsi:type="dc:Point" x="794" y="120" />
            <di:waypoint xsi:type="dc:Point" x="867" y="120" />
            <bpmndi:BPMNLabel>
              <dc:Bounds x="831" y="105" width="0" height="0" />
            </bpmndi:BPMNLabel>
          </bpmndi:BPMNEdge>
          <bpmndi:BPMNShape id="ServiceTask_09m0goq_di" bpmnElement="taskA">
            <dc:Bounds x="310" y="80" width="100" height="80" />
          </bpmndi:BPMNShape>
          <bpmndi:BPMNShape id="ServiceTask_0sryj72_di" bpmnElement="taskB">
            <dc:Bounds x="502" y="80" width="100" height="80" />
          </bpmndi:BPMNShape>
          <bpmndi:BPMNShape id="ServiceTask_1xu4l3g_di" bpmnElement="taskC">
            <dc:Bounds x="694" y="80" width="100" height="80" />
          </bpmndi:BPMNShape>
        </bpmndi:BPMNPlane>
      </bpmndi:BPMNDiagram>
    </bpmn:definitions>
    

    Create a Workflow Instance

    Prerequisites

    1. Running Zeebe broker with endpoint localhost:26500 (default)
    2. Run the Deploy a Workflow example

    WorkflowInstanceCreator.java

    Source on github

    /*
     * Copyright Camunda Services GmbH and/or licensed to Camunda Services GmbH under
     * one or more contributor license agreements. See the NOTICE file distributed
     * with this work for additional information regarding copyright ownership.
     * Licensed under the Zeebe Community License 1.0. You may not use this file
     * except in compliance with the Zeebe Community License 1.0.
     */
    package io.zeebe.example.workflow;
    
    import io.zeebe.client.ZeebeClient;
    import io.zeebe.client.ZeebeClientBuilder;
    import io.zeebe.client.api.response.WorkflowInstanceEvent;
    
    public final class WorkflowInstanceCreator {
    
      public static void main(final String[] args) {
        final String broker = "127.0.0.1:26500";
    
        final String bpmnProcessId = "demoProcess";
    
        final ZeebeClientBuilder builder =
            ZeebeClient.newClientBuilder().brokerContactPoint(broker).usePlaintext();
    
        try (final ZeebeClient client = builder.build()) {
    
          System.out.println("Creating workflow instance");
    
          final WorkflowInstanceEvent workflowInstanceEvent =
              client
                  .newCreateInstanceCommand()
                  .bpmnProcessId(bpmnProcessId)
                  .latestVersion()
                  .send()
                  .join();
    
          System.out.println(
              "Workflow instance created with key: " + workflowInstanceEvent.getWorkflowInstanceKey());
        }
      }
    }
    

    Create Workflow Instances Non-Blocking

    Prerequisites

    1. Running Zeebe broker with endpoint localhost:26500 (default)
    2. Run the Deploy a Workflow example

    NonBlockingWorkflowInstanceCreator.java

    Source on github

    /*
     * Copyright Camunda Services GmbH and/or licensed to Camunda Services GmbH under
     * one or more contributor license agreements. See the NOTICE file distributed
     * with this work for additional information regarding copyright ownership.
     * Licensed under the Zeebe Community License 1.0. You may not use this file
     * except in compliance with the Zeebe Community License 1.0.
     */
    package io.zeebe.example.workflow;
    
    import io.zeebe.client.ZeebeClient;
    import io.zeebe.client.ZeebeClientBuilder;
    import io.zeebe.client.api.ZeebeFuture;
    import io.zeebe.client.api.response.WorkflowInstanceEvent;
    
    public final class NonBlockingWorkflowInstanceCreator {
      public static void main(final String[] args) {
        final String broker = "127.0.0.1:26500";
        final int numberOfInstances = 100_000;
        final String bpmnProcessId = "demoProcess";
    
        final ZeebeClientBuilder builder =
            ZeebeClient.newClientBuilder().brokerContactPoint(broker).usePlaintext();
    
        try (final ZeebeClient client = builder.build()) {
          System.out.println("Creating " + numberOfInstances + " workflow instances");
    
          final long startTime = System.currentTimeMillis();
    
          long instancesCreating = 0;
    
          while (instancesCreating < numberOfInstances) {
            // this is non-blocking/async => returns a future
            final ZeebeFuture<WorkflowInstanceEvent> future =
                client.newCreateInstanceCommand().bpmnProcessId(bpmnProcessId).latestVersion().send();
    
            // could put the future somewhere and eventually wait for its completion
    
            instancesCreating++;
          }
    
          // creating one more instance; joining on this future ensures
          // that all the other create commands were handled
          client.newCreateInstanceCommand().bpmnProcessId(bpmnProcessId).latestVersion().send().join();
    
          System.out.println("Took: " + (System.currentTimeMillis() - startTime));
        }
      }
    }
    

    Create a Workflow Instance and Await Result

    Prerequisites

    1. Running Zeebe broker with endpoint localhost:26500 (default)
    2. Run the Deploy a Workflow example. Deploy demoProcessSingleTask.bpmn instead of demoProcess.bpmn

    WorkflowInstanceWithResultCreator.java

    Source on github

    /*
     * Copyright Camunda Services GmbH and/or licensed to Camunda Services GmbH under
     * one or more contributor license agreements. See the NOTICE file distributed
     * with this work for additional information regarding copyright ownership.
     * Licensed under the Zeebe Community License 1.0. You may not use this file
     * except in compliance with the Zeebe Community License 1.0.
     */
    package io.zeebe.example.workflow;
    
    import io.zeebe.client.ZeebeClient;
    import io.zeebe.client.ZeebeClientBuilder;
    import io.zeebe.client.api.response.WorkflowInstanceResult;
    import java.time.Duration;
    import java.util.Map;
    
    public class WorkflowInstanceWithResultCreator {
      public static void main(final String[] args) {
        final String broker = "127.0.0.1:26500";
    
        final String bpmnProcessId = "demoProcessSingleTask";
    
        final ZeebeClientBuilder builder =
            ZeebeClient.newClientBuilder().brokerContactPoint(broker).usePlaintext();
    
        try (final ZeebeClient client = builder.build()) {
    
          openJobWorker(client); // open job workers so that task are executed and workflow is completed
          System.out.println("Creating workflow instance");
    
          final WorkflowInstanceResult workflowInstanceResult =
              client
                  .newCreateInstanceCommand()
                  .bpmnProcessId(bpmnProcessId)
                  .latestVersion()
                  .withResult() // to await the completion of workflow execution and return result
                  .send()
                  .join();
    
          System.out.println(
              "Workflow instance created with key: "
                  + workflowInstanceResult.getWorkflowInstanceKey()
                  + " and completed with results: "
                  + workflowInstanceResult.getVariables());
        }
      }
    
      private static void openJobWorker(final ZeebeClient client) {
        client
            .newWorker()
            .jobType("foo")
            .handler(
                (jobClient, job) ->
                    jobClient
                        .newCompleteCommand(job.getKey())
                        .variables(Map.of("job", job.getKey()))
                        .send())
            .timeout(Duration.ofSeconds(10))
            .open();
      }
    }
    

    Open a Job Worker

    Related Resources

    Prerequisites

    1. Running Zeebe broker with endpoint localhost:26500 (default)
    2. Run the Deploy a Workflow example
    3. Run the Create a Workflow Instance example a couple of times

    JobWorkerCreator.java

    Source on github

    /*
     * Copyright Camunda Services GmbH and/or licensed to Camunda Services GmbH under
     * one or more contributor license agreements. See the NOTICE file distributed
     * with this work for additional information regarding copyright ownership.
     * Licensed under the Zeebe Community License 1.0. You may not use this file
     * except in compliance with the Zeebe Community License 1.0.
     */
    package io.zeebe.example.job;
    
    import io.zeebe.client.ZeebeClient;
    import io.zeebe.client.ZeebeClientBuilder;
    import io.zeebe.client.api.response.ActivatedJob;
    import io.zeebe.client.api.worker.JobClient;
    import io.zeebe.client.api.worker.JobHandler;
    import io.zeebe.client.api.worker.JobWorker;
    import java.time.Duration;
    import java.util.Scanner;
    
    public final class JobWorkerCreator {
      public static void main(final String[] args) {
        final String broker = "127.0.0.1:26500";
    
        final String jobType = "foo";
    
        final ZeebeClientBuilder builder =
            ZeebeClient.newClientBuilder().brokerContactPoint(broker).usePlaintext();
    
        try (final ZeebeClient client = builder.build()) {
    
          System.out.println("Opening job worker.");
    
          try (final JobWorker workerRegistration =
              client
                  .newWorker()
                  .jobType(jobType)
                  .handler(new ExampleJobHandler())
                  .timeout(Duration.ofSeconds(10))
                  .open()) {
            System.out.println("Job worker opened and receiving jobs.");
    
            // run until System.in receives exit command
            waitUntilSystemInput("exit");
          }
        }
      }
    
      private static void waitUntilSystemInput(final String exitCode) {
        try (final Scanner scanner = new Scanner(System.in)) {
          while (scanner.hasNextLine()) {
            final String nextLine = scanner.nextLine();
            if (nextLine.contains(exitCode)) {
              return;
            }
          }
        }
      }
    
      private static class ExampleJobHandler implements JobHandler {
        @Override
        public void handle(final JobClient client, final ActivatedJob job) {
          // here: business logic that is executed with every job
          System.out.println(job);
          client.newCompleteCommand(job.getKey()).send().join();
        }
      }
    }
    

    Handle variables as POJO

    Related Resources

    Prerequisites

    1. Running Zeebe broker with endpoint localhost:26500 (default)
    2. Run the Deploy a Workflow example

    HandleVariablesAsPojo.java

    Source on github

    /*
     * Copyright Camunda Services GmbH and/or licensed to Camunda Services GmbH under
     * one or more contributor license agreements. See the NOTICE file distributed
     * with this work for additional information regarding copyright ownership.
     * Licensed under the Zeebe Community License 1.0. You may not use this file
     * except in compliance with the Zeebe Community License 1.0.
     */
    package io.zeebe.example.data;
    
    import io.zeebe.client.ZeebeClient;
    import io.zeebe.client.ZeebeClientBuilder;
    import io.zeebe.client.api.response.ActivatedJob;
    import io.zeebe.client.api.worker.JobClient;
    import io.zeebe.client.api.worker.JobHandler;
    import java.util.Scanner;
    
    public final class HandleVariablesAsPojo {
      public static void main(final String[] args) {
        final String broker = "127.0.0.1:26500";
    
        final ZeebeClientBuilder builder =
            ZeebeClient.newClientBuilder().brokerContactPoint(broker).usePlaintext();
    
        try (final ZeebeClient client = builder.build()) {
          final Order order = new Order();
          order.setOrderId(31243);
    
          client
              .newCreateInstanceCommand()
              .bpmnProcessId("demoProcess")
              .latestVersion()
              .variables(order)
              .send()
              .join();
    
          client.newWorker().jobType("foo").handler(new DemoJobHandler()).open();
    
          // run until System.in receives exit command
          waitUntilSystemInput("exit");
        }
      }
    
      private static void waitUntilSystemInput(final String exitCode) {
        try (final Scanner scanner = new Scanner(System.in)) {
          while (scanner.hasNextLine()) {
            final String nextLine = scanner.nextLine();
            if (nextLine.contains(exitCode)) {
              return;
            }
          }
        }
      }
    
      public static class Order {
        private long orderId;
        private double totalPrice;
    
        public long getOrderId() {
          return orderId;
        }
    
        public void setOrderId(final long orderId) {
          this.orderId = orderId;
        }
    
        public double getTotalPrice() {
          return totalPrice;
        }
    
        public void setTotalPrice(final double totalPrice) {
          this.totalPrice = totalPrice;
        }
      }
    
      private static class DemoJobHandler implements JobHandler {
        @Override
        public void handle(final JobClient client, final ActivatedJob job) {
          // read the variables of the job
          final Order order = job.getVariablesAsType(Order.class);
          System.out.println("new job with orderId: " + order.getOrderId());
    
          // update the variables and complete the job
          order.setTotalPrice(46.50);
    
          client.newCompleteCommand(job.getKey()).variables(order).send();
        }
      }
    }
    

    Request Cluster Topology

    Shows which broker is leader and follower for which partition. Particularly useful when you run a cluster with multiple Zeebe brokers.

    Related Resources

    Prerequisites

    1. Running Zeebe broker with endpoint localhost:26500 (default)

    TopologyViewer.java

    Source on github

    /*
     * Copyright Camunda Services GmbH and/or licensed to Camunda Services GmbH under
     * one or more contributor license agreements. See the NOTICE file distributed
     * with this work for additional information regarding copyright ownership.
     * Licensed under the Zeebe Community License 1.0. You may not use this file
     * except in compliance with the Zeebe Community License 1.0.
     */
    package io.zeebe.example.cluster;
    
    import io.zeebe.client.ZeebeClient;
    import io.zeebe.client.ZeebeClientBuilder;
    import io.zeebe.client.api.response.Topology;
    
    public final class TopologyViewer {
    
      public static void main(final String[] args) {
        final String broker = "127.0.0.1:26500";
    
        final ZeebeClientBuilder builder =
            ZeebeClient.newClientBuilder().brokerContactPoint(broker).usePlaintext();
    
        try (final ZeebeClient client = builder.build()) {
          System.out.println("Requesting topology with initial contact point " + broker);
    
          final Topology topology = client.newTopologyRequest().send().join();
    
          System.out.println("Topology:");
          topology
              .getBrokers()
              .forEach(
                  b -> {
                    System.out.println("    " + b.getAddress());
                    b.getPartitions()
                        .forEach(
                            p ->
                                System.out.println(
                                    "      " + p.getPartitionId() + " - " + p.getRole()));
                  });
    
          System.out.println("Done.");
        }
      }
    }
    

    Zeebe Go Client

    Get Started with the Go client

    In this tutorial, you will learn to use the Go client in a Go application to interact with Zeebe.

    You will be guided through the following steps:

    You can find the complete source code, on GitHub.

    Prerequisites

    Before you begin to setup your project please start the broker, i.e. by running the start up script bin/broker or bin/broker.bat in the distribution. Per default the broker is binding to the address localhost:26500, which is used as contact point in this guide. In case your broker is available under another address please adjust the broker contact point when building the client.

    Set up a project

    First, we need a new Go project. Create a new project using your IDE, or create new Go module with:

    mkdir -p $GOPATH/src/github.com/zb-user/zb-example
    cd $GOPATH/src/github.com/zb-user/zb-example
    go mod init
    

    To use the Zeebe Go client library, add the following dependency to your go.mod:

    module github.com/zb-user/zb-example
    
    go 1.13
    
    require github.com/zeebe-io/zeebe/clients/go v0.24.1
    

    Create a main.go file inside the module and add the following lines to bootstrap the Zeebe client:

    package main
    
    import (
    	"context"
    	"fmt"
    	"github.com/zeebe-io/zeebe/clients/go/pkg/zbc"
    	"github.com/zeebe-io/zeebe/clients/go/pkg/pb"
    )
    
    const BrokerAddr = "0.0.0.0:26500"
    
    func main() {
    	client, err := zbc.NewClient(&zbc.ClientConfig{
          GatewayAddress:         BrokerAddr,
          UsePlaintextConnection: true,
    	})
    
    	if err != nil {
    		panic(err)
    	}
    
    	ctx := context.Background()
    	topology, err := client.NewTopologyCommand().Send(ctx)
    	if err != nil {
    		panic(err)
    	}
    
    	for _, broker := range topology.Brokers {
    		fmt.Println("Broker", broker.Host, ":", broker.Port)
    		for _, partition := range broker.Partitions {
    			fmt.Println("  Partition", partition.PartitionId, ":", roleToString(partition.Role))
    		}
    	}
    }
    
    func roleToString(role pb.Partition_PartitionBrokerRole) string {
    	switch role {
    	case pb.Partition_LEADER:
    		return "Leader"
    	case pb.Partition_FOLLOWER:
    		return "Follower"
    	default:
    		return "Unknown"
    	}
    }
    

    Run the program.

    go run main.go
    

    You should see similar output:

    Broker 0.0.0.0 : 26501
      Partition 1 : Leader
    

    Model a workflow

    Now, we need a first workflow which can then be deployed. Later, we will extend the workflow with more functionality.

    Open the Zeebe Modeler and create a new BPMN diagram. Add a start event and an end event to the diagram and connect the events.

    model-workflow-step-1

    Set the id to order-process (i.e., the BPMN process id) and mark the diagram as executable. Save the diagram in the project's source folder.

    Deploy a workflow

    Next, we want to deploy the modeled workflow to the broker. The broker stores the workflow under its BPMN process id and assigns a version (i.e., the revision).

    package main
    
    import (
    	"context"
    	"fmt"
    	"github.com/zeebe-io/zeebe/clients/go/pkg/zbc"
    )
    
    const brokerAddr = "0.0.0.0:26500"
    
    func main() {
    	client, err := zbc.NewClient(&zbc.ClientConfig{
          GatewayAddress:         brokerAddr,
          UsePlaintextConnection: true,
    	})
    
    	if err != nil {
    		panic(err)
    	}
    
    	ctx := context.Background()
    	response, err := client.NewDeployWorkflowCommand().AddResourceFile("order-process.bpmn").Send(ctx)
    	if err != nil {
    		panic(err)
    	}
    
    	fmt.Println(response.String())
    }
    

    Run the program and verify that the workflow is deployed successfully. You should see similar the output:

    key:2251799813686743 workflows:<bpmnProcessId:"order-process" version:3 workflowKey:2251799813686742 resourceName:"order-process.bpmn" >
    

    Create a workflow instance

    Finally, we are ready to create a first instance of the deployed workflow. A workflow instance is created of a specific version of the workflow, which can be set on creation.

    package main
    
    import (
    	"context"
    	"fmt"
    	"github.com/zeebe-io/zeebe/clients/go/pkg/zbc"
    )
    
    const brokerAddr = "0.0.0.0:26500"
    
    func main() {
    	client, err := zbc.NewClient(&zbc.ClientConfig{
          GatewayAddress:         brokerAddr,
          UsePlaintextConnection: true,
    	})
    
    	if err != nil {
    		panic(err)
    	}
    
    	// After the workflow is deployed.
    	variables := make(map[string]interface{})
    	variables["orderId"] = "31243"
    
    	request, err := client.NewCreateInstanceCommand().BPMNProcessId("order-process").LatestVersion().VariablesFromMap(variables)
    	if err != nil {
    		panic(err)
    	}
    
    	ctx := context.Background()
    	msg, err := request.Send(ctx)
    	if err != nil {
    		panic(err)
    	}
    
    	fmt.Println(msg.String())
    }
    

    Run the program and verify that the workflow instance is created. You should see the output:

    workflowKey:2251799813686742 bpmnProcessId:"order-process" version:3 workflowInstanceKey:2251799813686744
    

    You did it! You want to see how the workflow instance is executed?

    Start the Zeebe Monitor using java -jar zeebe-simple-monitor-app-*.jar.

    Open a web browser and go to http://localhost:8080/.

    Here, you see the current state of the workflow instance. zeebe-monitor-step-1

    Work on a task

    Now we want to do some work within your workflow. First, add a few service tasks to the BPMN diagram and set the required attributes. Then extend your main.go file and activate a job which are created when the workflow instance reaches a service task.

    Open the BPMN diagram in the Zeebe Modeler. Insert a few service tasks between the start and the end event.

    model-workflow-step-2

    You need to set the type of each task, which identifies the nature of the work to be performed. Set the type of the first task to payment-service.

    Add the following lines to redeploy the modified process, then activate and complete a job of the first task type:

    package main
    
    import (
    	"context"
    	"fmt"
    	"github.com/zeebe-io/zeebe/clients/go/pkg/entities"
    	"github.com/zeebe-io/zeebe/clients/go/pkg/worker"
    	"github.com/zeebe-io/zeebe/clients/go/pkg/zbc"
    	"log"
    )
    
    const BrokerAddr = "0.0.0.0:26500"
    
    var readyClose = make(chan struct{})
    
    func main() {
    	client, err := zbc.NewClient(&zbc.ClientConfig{
    		GatewayAddress:         BrokerAddr,
    		UsePlaintextConnection: true,
    	})
    	if err != nil {
    		panic(err)
    	}
    
    	// deploy workflow
    	ctx := context.Background()
    	response, err := client.NewDeployWorkflowCommand().AddResourceFile("order-process-4.bpmn").Send(ctx)
    	if err != nil {
    		panic(err)
    	}
    
    	fmt.Println(response.String())
    
    	// create a new workflow instance
    	variables := make(map[string]interface{})
    	variables["orderId"] = "31243"
    
    	request, err := client.NewCreateInstanceCommand().BPMNProcessId("order-process-4").LatestVersion().VariablesFromMap(variables)
    	if err != nil {
    		panic(err)
    	}
    
    	result, err := request.Send(ctx)
    	if err != nil {
    		panic(err)
    	}
    
    	fmt.Println(result.String())
    
    	jobWorker := client.NewJobWorker().JobType("payment-service").Handler(handleJob).Open()
    
    	<-readyClose
    	jobWorker.Close()
    	jobWorker.AwaitClose()
    }
    
    func handleJob(client worker.JobClient, job entities.Job) {
    	jobKey := job.GetKey()
    
    	headers, err := job.GetCustomHeadersAsMap()
    	if err != nil {
    		// failed to handle job as we require the custom job headers
    		failJob(client, job)
    		return
    	}
    
    	variables, err := job.GetVariablesAsMap()
    	if err != nil {
    		// failed to handle job as we require the variables
    		failJob(client, job)
    		return
    	}
    
    	variables["totalPrice"] = 46.50
    	request, err := client.NewCompleteJobCommand().JobKey(jobKey).VariablesFromMap(variables)
    	if err != nil {
    		// failed to set the updated variables
    		failJob(client, job)
    		return
    	}
    
    	log.Println("Complete job", jobKey, "of type", job.Type)
    	log.Println("Processing order:", variables["orderId"])
    	log.Println("Collect money using payment method:", headers["method"])
    
    	ctx := context.Background()
    	_, err = request.Send(ctx)
    	if err != nil {
    		panic(err)
    	}
    
    	log.Println("Successfully completed job")
    	close(readyClose)
    }
    
    func failJob(client worker.JobClient, job entities.Job) {
    	log.Println("Failed to complete job", job.GetKey())
    
    	ctx := context.Background()
    	_, err := client.NewFailJobCommand().JobKey(job.GetKey()).Retries(job.Retries - 1).Send(ctx)
    	if err != nil {
    		panic(err)
    	}
    }
    

    In this example we open a job worker for jobs of type payment-service. The job worker will repeatedly poll for new jobs of the type payment-service and activate them subsequently. Each activated job will then be passed to the job handler which implements the business logic of the job worker. The handler will then complete the job with its result or fail the job if it encounters a problem while processing the job.

    When you have a look at the Zeebe Monitor, then you can see that the workflow instance moved from the first service task to the next one:

    zeebe-monitor-step-2

    When you run the above example you should see similar output:

    key:2251799813686751 workflows:<bpmnProcessId:"order-process" version:4 workflowKey:2251799813686750 resourceName:"order-process.bpmn" >
    workflowKey:2251799813686750 bpmnProcessId:"order-process" version:4 workflowInstanceKey:22517998136
    86752
    2019/06/06 20:59:50 Complete job 2251799813686760 of type payment-service
    2019/06/06 20:59:50 Processing order: 31243
    2019/06/06 20:59:50 Collect money using payment method: VISA
    

    What's next?

    Yay! You finished this tutorial and learned the basic usage of the Go client.

    Next steps:

    Other Clients

    In addition to the core Java and Go clients provided by Zeebe, there are a number of community-maintained Zeebe client libraries.

    C#

    The C# client is a community library maintained by Christopher Zell.

    JavaScript

    Zeebe Node

    The Zeebe Node client is maintained by Josh Wulf. It can be used to create Node.js applications.

    NestJS client

    The NestJS client is maintained by Dan Shapir. It is a microservice transport that integrates Zeebe with the NestJS framework.

    Node-RED

    The Node-RED Zeebe client is maintained by Patrick Dehn.

    Workit Zeebe Client

    The Workit Zeebe client is maintained by Olivier Albertini. It allows you to run the same application code against Zeebe or the Camunda engine, based on configuration settings.

    Zeebe ElasticSearch client

    The Zeebe ElasticSearch client is maintained by Olivier Albertini. It provides an API for querying Zeebe's ElasticSearch export.

    Python

    The Python client is maintained by Stéphane Ludwig.

    Ruby

    The Ruby client is maintained by Christian Nicolai.

    Rust

    The Rust client, Zeebest, is maintained by Mackenzie Clark.

    Operations

    Development

    We recommend using Docker during development. This gives you a consistent, repeatable development environment.

    Production

    In Production, we recommend using Kubernetes and container images. This provides you with predictable and consistent configuration, and the ability to manage deployment using automation tools.

    Tools For Monitoring And Managing Workflows

    Operate is a tool that was built for monitoring and managing Zeebe workflows. We walk through how to install Operate in the "Getting Started" tutorial.

    The current Operate release is a developer preview and is available for non-production use only. You can find the Operate preview license here.

    We plan to release Operate under an enterprise license for production use in the future.

    Alternatively:

    • There's a community project called Simple Monitor that can also be used to inspect deployed workflows and workflow instances. Simple Monitor is not intended for production use, but can be useful during development for debugging.
    • It's possible to combine Kibana with Zeebe's Elasticsearch exporter to create a dashboard for monitoring the state of Zeebe.

    Configuration

    Zeebe can be configured through:

    • configuration files,
    • environment variables,
    • or a mix of both.

    If both configuration files and environment variables are present, then environment variables overwrite settings in configuration files.

    If you want to make small changes to the configuration, we recommend to use environment variables. If you want to make big changes to the configuration, we recommend to use a configuration file.

    The configuration will be applied during startup of Zeebe. It is not possible to change the configuration at runtime.

    Default Configuration

    The default configuration is located in config/application.yaml. This configuration contains the most common configuration settings for a standalone broker. It also lists the corresponding environment variable for each setting.

    Note

    The default configuration is not suitable for a standalone gateway node. If you want to run a standalone gateway node, please have a look at /config/gateway.yaml.template

    Configuration file templates

    We provide templates that contain all possible configuration settings, along with explanations for each setting:

    Note that these templates also include the corresponding environment variables to use for every setting.

    Editing the configuration

    You can either start from scratch or start from the configuration templates listed above.

    If you use a configuration template and want to uncomment certain lines, make sure to also uncomment their parent elements:

    Valid Configuration
    
        zeebe:
          gateway:
            network:
              # host: 0.0.0.0
              port: 26500
    
    Invalid configuration
    
        # zeebe:
          # gateway:
            # network:
              # host: 0.0.0.0
              port: 26500
    

    Uncommenting individual lines is a bit finicky, because YAML is sensitive to indentation. The best way to do it is to position the cursor before the # character and delete two characters (the dash and the space). Doing this consistently will give you a valid YAML file.

    When it comes to editing individual settings two data types are worth mentioning:

    • Data Sizes (e.g. logSegmentSize)
      • Human friendly format: 500MB (or KB, GB)
      • Machine friendly format: size in bytes as long
    • Timeouts/Intervals (e.g. requestTimeout)
      • Human friendly format: 15s (or m, h)
      • Machine friendly format: either duration in milliseconds as long, or ISO-8601 Duration format (e.g. PT15S)

    Passing Configuration Files to Zeebe

    Rename the configuration file to application.yaml and place it in the following location:

    ./config/application.yaml
    

    Other ways to specify the configuration file

    Zeebe uses Spring Boot for its configuration parsing. So all other ways to configure a Spring Boot application should also work. In particular, you can use:

    • SPRING_CONFIG_ADDITIONAL_LOCATION to specify an additional configuration file.
    • SPRING_APPLICATION_JSON to specify settings in JSON format.

    Details can be found in the Srping documentation.

    Note

    We recommend not to use SPRING_CONFIG_LOCATION as this will replace all existing configuration defaults. When used inappropriately, some features will be disabled or will not be configured properly.

    If you specify SPRING_CONFIG_LOCATION, then specify it like this:

    export SPRING_CONFIG_LOCATION='classpath:/,file:./[path to config file]'
    

    This will ensure that the defaults defined in the classpath resources will be used (unless explicitly overwritten by the configuration file you provide). If you omit the defaults defined in the classpath, some features may be disabled or will not be configured properly.

    Verifying that configuration was applied

    To verify that the configuration was applied, start Zeebe and look at the log.

    If the configuration could be read, Zeebe will log out the effective configuration during startup:

    17:13:13.120 [] [main] INFO  io.zeebe.broker.system - Starting broker 0 with configuration {
      "network": {
        "host": "0.0.0.0",
        "portOffset": 0,
        "maxMessageSize": {
          "bytes": 4194304
        },
        "commandApi": {
          "defaultPort": 26501,
          "host": "0.0.0.0",
          "port": 26501,
    ...
    

    In some cases of invalid configuration Zeebe will fail to start with a warning that explains which configuration setting could not be read.

    17:17:38.796 [] [main] ERROR org.springframework.boot.diagnostics.LoggingFailureAnalysisReporter -
    
    ***************************
    APPLICATION FAILED TO START
    ***************************
    
    Description:
    
    Failed to bind properties under 'zeebe.broker.network.port-offset' to int:
    
        Property: zeebe.broker.network.port-offset
        Value: false
        Origin: System Environment Property "ZEEBE_BROKER_NETWORK_PORTOFFSET"
        Reason: failed to convert java.lang.String to int
    
    Action:
    
    Update your application's configuration
    

    Logging

    Zeebe uses Log4j2 framework for logging. In the distribution and the docker image you can find the default log configuration file in config/log4j2.xml.

    Google Stackdriver (JSON) logging

    To enable Google Stackdriver compatible JSON logging you can set the environment variable ZEEBE_LOG_APPENDER=Stackdriver before starting Zeebe.

    Default logging configuration

    • config/log4j2.xml (applied by default)
    <?xml version="1.0" encoding="UTF-8"?>
    <Configuration status="WARN" shutdownHook="disable">
    
      <Properties>
        <Property name="log.path">${sys:app.home}/logs</Property>
        <Property name="log.pattern">%d{yyyy-MM-dd HH:mm:ss.SSS} [%X{actor-name}] [%t] %-5level %logger{36} - %msg%n</Property>
        <Property name="log.stackdriver.serviceName">${env:ZEEBE_LOG_STACKDRIVER_SERVICENAME:-}</Property>
        <Property name="log.stackdriver.serviceVersion">${env:ZEEBE_LOG_STACKDRIVER_SERVICEVERSION:-}</Property>
      </Properties>
    
      <Appenders>
        <Console name="Console" target="SYSTEM_OUT">
          <PatternLayout
            pattern="${log.pattern}"/>
        </Console>
    
        <Console name="Stackdriver" target="SYSTEM_OUT">
          <StackdriverLayout serviceName="${log.stackdriver.serviceName}"
            serviceVersion="${log.stackdriver.serviceVersion}" />
        </Console>
    
        <RollingFile name="RollingFile" fileName="${log.path}/zeebe.log"
          filePattern="${log.path}/zeebe-%d{yyyy-MM-dd}-%i.log.gz">
          <PatternLayout>
            <Pattern>${log.pattern}</Pattern>
          </PatternLayout>
          <Policies>
            <TimeBasedTriggeringPolicy/>
            <SizeBasedTriggeringPolicy size="250 MB"/>
          </Policies>
        </RollingFile>
      </Appenders>
    
      <Loggers>
        <Logger name="io.zeebe" level="${env:ZEEBE_LOG_LEVEL:-info}"/>
    
        <Logger name="io.atomix" level="${env:ATOMIX_LOG_LEVEL:-warn}"/>
    
        <Root level="info">
          <AppenderRef ref="RollingFile"/>
    
          <!-- remove to disable console logging -->
          <AppenderRef ref="${env:ZEEBE_LOG_APPENDER:-Console}"/>
        </Root>
      </Loggers>
    
    </Configuration>
    

    Change log level dynamically

    Zeebe brokers expose a Spring Boot Actuators web endpoint for configuring loggers dynamically. To change the log level of a logger make a POST request to the /actuator/loggers/{logger.name} endpoint as shown in the example below. Change io.zeebe to the required logger name and debug to required log level.

    curl 'http://localhost:9600/actuator/loggers/io.zeebe' -i -X POST -H 'Content-Type: application/json' -d '{"configuredLevel":"debug"}'
    

    Health Probes

    Health probes are set to sensible defaults which cover common use cases.

    For specific use cases, it might be necessary to customize health probes:

    Resource Planning

    The short answer to “what resources and configuration will I need to take Zeebe to production?” is: it depends.

    While we cannot tell you exactly what you need - beyond it depends - we can explain what depends, what it depends on, and how it depends on it.

    Disk Space

    All Brokers in a partition use disk space to store:

    • The event log for each partition they participate in. By default, this is a minimum of 512MB for each partition, incrementing in 512MB segments. The event log is truncated on a given broker when data has been processed and successfully exported by all loaded exporters.
    • One periodic snapshots of the running state (in-flight data) of each partition (unbounded, based on in-flight work).

    Additionally, the leader of a partition also uses disk space to store:

    • A projection of the running state of the partition in RocksDB. (unbounded, based on in-flight work)

    To calculate the required amount of disk space, the following "back of the envelope" formula can be used as a starting point:

    neededDiskSpace = replicatedState + localState
    
    replicatedState = totalEventLogSize + totalSnapshotSize
    
    totalEventLogSize = followerPartitionsPerNode * eventLogSize * reserveForPartialSystemFailure 
    
    totalSnapshotSize = partitionsPerNode * singleSnapshotSize * 2 
    // singleSnapshotSize * 2: 
    //   the last snapshot (already replicated) +
    //   the next snapshot (in transit, while it is being replicated) 
    
    partitionsPerNode = leaderPartitionsPerNde + followerPartitionsPerNode
    
    leaderPartitionsPerNode = partitionsCount / numberOfNodes 
    followerPartitionsPerNode = partitionsCount * replicationFactor / numberOfNodes 
    
    clusterSize = [number of broker nodes]
    partitionsCount = [number of partitions]
    replicationFactor = [number of replicas per partition]
    reserveForPartialSystemFailure = [factor to account for partial system failure]  
    singleSnapshotSize = [size of a single rocks DB snapshot]                  
    eventLogSize = [event log size for duration of snapshotPeriod] 
    

    Some observations on the scaling of the factors above:

    • eventLogSize: This factor scales with the throughput of your system
    • totalSnapshotSize: This factor scales with the number of in-flight workflows
    • reserveForPartialSystemFailure: This factor is supposed to be a reserve to account for partial system failure (e.g. loss of quorum inside Zeebe cluster, or loss of connection to external system). See the remainder of this document for a further discussion on the effects of partial system failure on Zeebe cluster and disk space provisioning.

    Many of the factors influencing above formula can be fine-tuned in the configuration. The relevant configuration settings are:

    Config file
        zeebe:
          broker:
            data:
              logSegmentSize: 512MB
              snapshotPeriod: 15m
            cluster:
              partitionsCount: 1
              replicationFactor: 1
              clusterSize: 1
    
    Environment Variables
      ZEEBE_BROKER_DATA_LOGSEGMENTSIZE = 512MB
      ZEEBE_BROKER_DATA_SNAPSHOTPERIOD = 15m
      ZEEBE_BROKER_CLUSTER_PARTITIONSCOUNT = 1
      ZEEBE_BROKER_CLUSTER_REPLICATIONFACTOR = 1
      ZEEBE_BROKER_CLUSTER_CLUSTERSIZE = 1
    

    Other factors can be observed in a production-like system with representative throughput.

    If you want to know where to look, by default this data is stored in

    • segments - the data of the log split into segments. The log is only appended - its data can be deleted when it becomes part of a new snapshot.
    • state - the active state. Deployed workflows, active workflow instances, etc. Completed workflow instances or jobs are removed.
    • snapshot - a state at a certain point in time

    Pitfalls

    If you want to avoid exploding your disk space usage, here are a few pitfalls to avoid:

    • Do not create a high number of snapshots with a long period between them.
    • Do not configure an exporter which does not not advance its record position (such as the Debug Exporter)

    If you do configure an exporter, make sure to monitor its availability and health, as well as the availability and health the exporter depends on. This is the Achilles' heel of the cluster. If data cannot be exported, it cannot be removed from the cluster and will accumulate on disk. See Effect of exporters and external system failure further on in this document for an explanation and possible buffering strategies.

    Event Log

    The event log for each partition is segmented. By default, the segment size is 512MB.

    The event log will grow over time, unless and until individual event log segments are deleted.

    An event log segment can be deleted once:

    • all the events it contains have been processed by exporters,
    • all the events it contains have been replicated to other brokers,
    • all the events it contains have been processed, and
    • the maximum number of snapshots has been reached.

    The following conditions inhibit the automatic deletion of event log segments:

    • A cluster loses its quorum. In this case events are queued but not processed. Once a quorum is reestablished, events will be replicated and eventually event log segments will be deleted.
    • The max number of snapshots has not been written. Log segment deletion will begin as soon as the max number of snapshots has been reached
    • An exporter does not advance its read position in the event log. In this case the event log will grow ad infinitum.

    An event log segment is not deleted until all the events in it have been exported by all configured exporters. This means that exporters that rely on side-effects, perform intensive computation, or experience back pressure from external storage will cause disk usage to grow, as they delay the deletion of event log segments.

    Exporting is only performed on the partition leader, but the followers of the partition do not delete segments in their replica of the partition until the leader marks all events in it as unneeded by exporters.

    We make sure that event log segments are not deleted too early. No event log segment is deleted until a snapshot has been taken that includes that segment. When a snapshot has been taken, the event log is only deleted up to that point.

    Snapshots

    The running state of the partition is captured periodically on the leader in a snapshot. By default, this period is every 15 minutes. This can be changed in the configuration.

    A snapshot is a projection of all events that represent the current running state of the workflows running on the partition. It contains all active data, for example, deployed workflows, active workflow instances, and not yet completed jobs.

    When the broker has written a new snapshot, it deletes all data on the log which was written before the latest snapshot.

    RocksDB

    On the lead broker of a partition, the current running state is kept in memory, and on disk in RocksDB. In our experience this grows to 2GB under a heavy load of long-running processes. The snapshots that are replicated to followers are snapshots of RocksDB.

    Effect of exporters and external system failure

    If an external system relied on by an exporter fails - for example, if you are exporting data to ElasticSearch and the connection to the ElasticSearch cluster fails - then the exporter will not advance its position in the event log, and brokers cannot truncate their logs. The broker event log will grow until the exporter is able to re-establish the connection and export the data. To ensure that your brokers are resilient in the event of external system failure, give them sufficient disk space to continue operating without truncating the event log until the connection to the external system is restored.

    Effect on exporters of node failure

    Only the leader of a partition exports events. Only committed events (events that have been replicated) are passed to exporters. The exporter will then update its read position. The exporter read position is only replicated between brokers in the snapshot. It is not itself written to the event log. This means that an exporter’s current position cannot be reconstructed from the replicated event log, only from a snapshot.

    When a partition fails over to a new leader, the new leader is able to construct the current partition state by projecting the event log from the point of the last snapshot. The position of exporters cannot be reconstructed from the event log, so it is set to the last snapshot. This means that an exporter can see the same events twice in the event of a fail-over.

    You should assign idempotent ids to events in your exporter if this is an issue for your system. The combination of record position and partition id is reliable as a unique id for an event.

    Effect of quorum loss

    If a partition goes under quorum (for example: if two nodes in a three node cluster go down), then the leader of the partition will continue to accept requests, but these requests will not be replicated and will not be marked as committed. In this case, they cannot be truncated. This causes the event log to grow. The amount of disk space needed to continue operating in this scenario is a function of the broker throughput and the amount of time to quorum being restored. You should ensure that your nodes have sufficient disk space to handle this failure mode.

    Network Ports

    The broker cluster sits behind the gRPC Gateway, which handles all requests from clients/workers and forwards events to brokers.

    Gateway

    The gateway needs to receive communication on

    • zeebe.gateway.network.port: 26500 from clients/workers, and
    • zeebe.gateway.cluster.contactPoint: 127.0.0.1:26502 from brokers

    The relevant configuration settings are:

    Config file
        zeebe:
          gateway:
            network:
              port: 26500
            cluster:
              contactPoint: 127.0.0.1:26502
            
    
    Environment Variables
      ZEEBE_GATEWAY_CLUSTER_NETWORK_PORT = 26500
      ZEEBE_GATEWAY_CLUSTER_CONTACTPOINT = 127.0.0.1:26502  
    

    Broker

    The broker needs to receive communication from the gateway and from other brokers. It also exposes a port for monitoring.

    • zeebe.broker.network.commandApi.port: 26501: Gateway-to-broker communication, using an internal SBE (Simple Binary Encoding) protocol. This is the Command API port. This should be exposed to the gateway.
    • zeebe.broker.network.internalApi.port: 26502: Inter-broker clustering using the Gossip and Raft protocols for partition replication, broker elections, topology sharing, and message subscriptions. This should be exposed to other brokers and the gateway.
    • zeebe.broker.network.monitoringApi.port: 9600: Metrics and Readiness Probe. Prometheus metrics are exported on the route /metrics. There is a readiness probe on /ready.

    The relevant configuration settings are:

    Config file
        zeebe:
          broker:
            network:
              commandAPI:
                port: 26501
              internalAPI:
                port: 26502
              monitoringApi
                port: 9600
    
    Environment Variables
      ZEEBE_BROKER_NETWORK_COMMANDAPI_PORT = 26501
      ZEEBE_BROKER_NETWORK_INTERNALAPI_PORT = 26501
      ZEEBE_BROKER_NETWORK_MONITOIRNGAPI_PORT = 26501
    

    Setting up a Zeebe Cluster

    To set up a cluster you need to adjust the cluster section in the Zeebe configuration file. Below is a snippet of the default Zeebe configuration file, it should be self-explanatory.

    ...
        cluster:
          # This section contains all cluster related configurations, to setup a zeebe cluster
    
          # Specifies the unique id of this broker node in a cluster.
          # The id should be between 0 and number of nodes in the cluster (exclusive).
          #
          # This setting can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_NODEID.
          nodeId: 0
    
          # Controls the number of partitions, which should exist in the cluster.
          #
          # This can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_PARTITIONSCOUNT.
          partitionsCount: 1
    
          # Controls the replication factor, which defines the count of replicas per partition.
          # The replication factor cannot be greater than the number of nodes in the cluster.
          #
          # This can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_REPLICATIONFACTOR.
          replicationFactor: 1
    
          # Specifies the zeebe cluster size. This value is used to determine which broker
          # is responsible for which partition.
          #
          # This can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_CLUSTERSIZE.
          clusterSize: 1
    
          # Allows to specify a list of known other nodes to connect to on startup
          # The contact points of the internal network configuration must be specified.
          # The format is [HOST:PORT]
          # Example:
          # initialContactPoints : [ 192.168.1.22:26502, 192.168.1.32:26502 ]
          #
          # To guarantee the cluster can survive network partitions, all nodes must be specified
          # as initial contact points.
          #
          # This setting can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_INITIALCONTACTPOINTS
          # specifying a comma-separated list of contact points.
          # Default is empty list:
          initialContactPoints: []
    
          # Allows to specify a name for the cluster
          # This setting can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_CLUSTERNAME.
          # Example:
          clusterName: zeebe-cluster
    

    Example

    In this example, we will set up a Zeebe cluster with five brokers. Each broker needs to get a unique node id. To scale well, we will bootstrap five partitions with a replication factor of three. For more information about this, please take a look into the Clustering section.

    The clustering setup will look like this:

    cluster

    Configuration

    The configuration of the first broker could look like this:

    ...
      cluster:
        nodeId: 0
        partitionsCount: 5
        replicationFactor: 3
        clusterSize: 5
        initialContactPoints: [
          ADDRESS_AND_PORT_OF_NODE_0,
          ADDRESS_AND_PORT_OF_NODE_1,
          ADDRESS_AND_PORT_OF_NODE_2,
          ADDRESS_AND_PORT_OF_NODE_3,
          ADDRESS_AND_PORT_OF_NODE_4
        ]
    

    For the other brokers the configuration will slightly change.

    ...
      cluster:
        nodeId: NODE_ID
        partitionsCount: 5
        replicationFactor: 3
        clusterSize: 5
        initialContactPoints: [
          ADDRESS_AND_PORT_OF_NODE_0,
          ADDRESS_AND_PORT_OF_NODE_1,
          ADDRESS_AND_PORT_OF_NODE_2,
          ADDRESS_AND_PORT_OF_NODE_3,
          ADDRESS_AND_PORT_OF_NODE_4
        ]
    
    

    Each broker needs a unique node id. The ids should be in the range of zero and clusterSize - 1. You need to replace the NODE_ID placeholder with an appropriate value. Furthermore, the brokers need an initial contact point to start their gossip conversation. Make sure that you use the address and management port of another broker. You need to replace the ADDRESS_AND_PORT_OF_NODE_0 placeholder.

    To guarantee that a cluster can properly recover from network partitions, it is currently required that all nodes be specified as initial contact points. It is not necessary for a broker to list itself as initial contact point, but it is safe to do so, and probably simpler to maintain.

    Partitions bootstrapping

    On bootstrap, each node will create a partition matrix.

    This matrix depends on the partitions count, replication factor and the cluster size. If you did the configuration right and used the same values for partitionsCount, replicationFactor and clusterSize on each node, then all nodes will generate the same partition matrix.

    For the current example the matrix will look like the following:

    Node 0 Node 1 Node 2 Node 3 Node 4
    Partition 0 Leader Follower Follower - -
    Partition 1 - Leader Follower Follower -
    Partition 2 - - Leader Follower Follower
    Partition 3 Follower - - Leader Follower
    Partition 4 Follower Follower - - Leader

    The matrix ensures that the partitions are well distributed between the different nodes. Furthermore, it guarantees that each node knows exactly, which partitions it has to bootstrap and for which it will become the leader at first (this could change later, if the node needs to step down for example).

    Keep Alive Intervals

    It's possible to specify how often Zeebe clients should send keep alive pings. By default, the official Zeebe clients (Java and Go) send keep alive pings every 45 seconds. This interval can be configured through the clients' APIs and through the ZEEBE_KEEP_ALIVE environment variable. When configuring the clients with the environment variable, the time interval must be expressed a positive amount of milliseconds (e.g., 45000).

    It's also possible to specify what is the minimum interval allowed by the gateway before it terminates the connection. By default, gateways terminate connections if they receive more than two pings with an interval less than 30 seconds. This minimum interval can be modified by editing the network section in the respective configuration file or by setting the ZEEBE_GATEWAY_NETWORK_MINKEEPALIVEINTERVAL environment variable.

    The Metrics

    When operating a distributed system like Zeebe, it is important to put proper monitoring in place. To facilitate this, Zeebe exposes an extensive set of metrics.

    Zeebe exposes metrics over an embedded HTTP server.

    Types of metrics

    • Counters: a time series that records a growing count of some unit. Examples: number of bytes transmitted over the network, number of workflow instances started, ...
    • Gauges: a time series that records the current size of some unit. Examples: number of currently open client connections, current number of partitions, ...

    Metrics Format

    Zeebe exposes metrics directly in Prometheus text format. The details of the format can be read in the Prometheus documentation.

    Example:

    # HELP zeebe_stream_processor_events_total Number of events processed by stream processor
    # TYPE zeebe_stream_processor_events_total counter
    zeebe_stream_processor_events_total{action="written",partition="1",} 20320.0
    zeebe_stream_processor_events_total{action="processed",partition="1",} 20320.0
    zeebe_stream_processor_events_total{action="skipped",partition="1",} 2153.0
    

    Configuring Metrics

    The HTTP server to export the metrics can be configured in the configuration file.

    Connecting Prometheus

    As explained, Zeebe exposes the metrics over a HTTP server. The default port is 9600.

    Add the following entry to your prometheus.yml:

    - job_name: zeebe
      scrape_interval: 15s
      metrics_path: /metrics
      scheme: http
      static_configs:
      - targets:
        - localhost: 9600
    

    Available Metrics

    All Zeebe related metrics have a zeebe_-prefix.

    Most metrics have the following common label:

    • partition: cluster-unique id of the partition

    Metrics related to workflow processing:

    • zeebe_stream_processor_events_total: The number of events processed by the stream processor. The action label separates processed, skipped and written events.
    • zeebe_exporter_events_total: The number of events processed by the exporter processor. The action label separates exported and skipped events.
    • zeebe_element_instance_events_total: The number of occurred workflow element instance events. The action label separates the number of activated, completed and terminated elements. The type label separates different BPMN element types.
    • zeebe_running_workflow_instances_total: The number of currently running workflow instances, i.e. not completed or terminated.
    • zeebe_job_events_total: The number of job events. The action label separates the number of created, activated, timed out, completed, failed and canceled jobs.
    • zeebe_pending_jobs_total: The number of currently pending jobs, i.e. not completed or terminated.
    • zeebe_incident_events_total: The number of incident events. The action label separates the number of created and resolved incident events.
    • zeebe_pending_incidents_total: The number of currently pending incident, i.e. not resolved.

    Metrics related to performance:

    Zeebe has a back-pressure mechanism by which it rejects requests, when it receives more requests than it can handle with out incurring high processing latency. The following metrics can be used to monitor back-pressure and processing latency of the commands.

    • zeebe_dropped_request_count_total: The number of user requests rejected by the broker due to backpressure.
    • zeebe_backpressure_requests_limit: The limit for the number of inflight requests used for backpressure.
    • zeebe_stream_processor_latency_bucket: The processing latency for commands and event.

    Metrics related to health:

    The health of partitions in a broker can be monitored by the metric zeebe_health.

    Grafana

    Zeebe comes with a pre-built dashboard, available in the repository: monitor/grafana/zeebe.json

    Import it into your Grafana instance, then select the correct Prometheus data source (important if you have more than one), and you should be greeted with the following dashboard:

    cluster

    Deploying to Kubernetes

    We recommend that you use Kubernetes when deploying Zeebe to production.

    Zeebe needs to be deployed as a StatefulSet, in order to preserve the identity of cluster nodes. StatefulSets require persistent storage, which needs to be allocated in advance. Depending on your cloud provider, the persistent storage will differ, as it is provider-specific.

    In the zeebe-kubernetes repository you will find example Kubernetes manifests to configure a three broker cluster with the Elastic Search exporter and the Operate preview. Examples are provided for provisioning storage on Google Cloud Platform, and Microsoft Azure.

    There are many ways that you can provision and configure a Kubernetes cluster. And there are a number of architectural choices you need to make: will your workers run in the Kubernetes cluster or external to it?

    You will need to configure your Kubernetes cluster and modify this to suit the architecture you are building.

    Gateway

    Zeebe gateway is deployed as a stateless service.

    We support Kubernetes startup and liveness probes for Zeebe gateway.

    Security Configuration

    Zeebe supports two security features that you should be aware of:

    • Authentication - allows you to secure communication between clients and gateways;
    • Authorization - allows you to supply access credentials to the client so these can be validated by a reverse proxy placed before the gateway.

    Authentication

    Zeebe supports transport layer security between the gateway and all of the officially supported clients. In this section, we will go through how to configure these components.

    Gateway

    Transport layer security in the gateway is disabled by default. This means that if you are just experimenting with Zeebe or in development, there is no configuration needed. However, if you want to enable authentication you can configure Zeebe in the security section of the configuration files. The following configurations are present in both gateway.yaml.template and broker.standalone.yaml.template, the file you should edit depends on whether you are using a standalone gateway or an embedded gateway.

    ...
      security:
        # Enables TLS authentication between clients and the gateway
        enabled: false
    
        # Sets the path to the certificate chain file
        certificateChainPath:
    
        # Sets the path to the private key file location
        privateKeyPath:
    

    enabled should be either true or false, where true will enable TLS authentication between client and gateway, and false will disable it. certificateChainPath and privateKeyPath are used to configure the certificate with which the server will authenticate itself. certificateChainPath should be a file path pointing to a certificate chain in PEM format representing the server's certificate, and privateKeyPath a file path pointing to the certificate's PKCS8 private key, also in PEM format.

    Additionally, as you can see in the configuration file, each value can also be configured through an environment variable. The environment variable to use again depends on whether you are using a standalone gateway or an embedded gateway.

    Clients

    Unlike the gateway, TLS is enabled by default in all of Zeebe's supported clients. The following sections will show how to disable or properly configure each client.

    Note: Disabling TLS should only be done for testing or development. During production deployments, clients and gateways should be properly configured to establish secure connections.

    Java

    Without any configuration, the client will look in system's certificate store for a CA certificate with which to validate the gateway's certificate chain. If you wish to use TLS without having to install a certificate in client's system, you can specify a CA certificate:

    public class SecureClient {
        public static void main(final String[] args) {
            final ZeebeClient client = ZeebeClient.newClientBuilder().caCertificatePath("path/to/certificate").build();
    
            // ...
        }
    }
    

    Alternatively, you can use the ZEEBE_CA_CERTIFICATE_PATH environment variable to override the code configuration.

    In order to disable TLS in a Java client, you can use the .usePlaintext() option:

    public class InsecureClient {
        public static void main(final String[] args) {
            final ZeebeClient client = ZeebeClient.newClientBuilder().usePlaintext().build();
    
            // ...
        }
    }
    

    Alternatively, you can use the ZEEBE_INSECURE_CONNECTION environment variable to override the code configuration. To enable an insecure connection, you can it to "true". To use a secure connection, you can set it any non-empty value other than "true". Setting the environment variable to an empty string is equivalent to unsetting it.

    Go

    Similarly to the Java client, if no CA certificate is specified then the client will look in the default location for a CA certificate with which to validate the gateway's certificate chain. It's also possible to specify a path to a CA certificate in the Go client:

    package test
    
    import (
    	"github.com/zeebe-io/zeebe/clients/go/zbc"
    )
    
    
    func main() {
    	client, err := zbc.NewClient(&zbc.ClientConfig{
    		CaCertificatePath: "path/to/certificate",
    	})
    
    	// ...
    }
    

    To disable TLS, you can simply do:

    package test
    
    import (
    	"github.com/zeebe-io/zeebe/clients/go/zbc"
    )
    
    
    func main() {
    	client, err := zbc.NewClient(&zbc.ClientConfig{
    		UsePlaintextConnection: true,
    	})
    
      // ...
    }
    

    As in the Java client, you can use the ZEEBE_INSECURE_CONNECTION and ZEEBE_CA_CERTIFICATE_PATH to override these configurations.

    zbctl

    To configure zbctl to use a path to a CA certificate:

    ./zbctl --certPath /my/certificate/location <command> [arguments]
    

    To configure zbctl to disable TLS:

    ./zbctl --insecure <command> [arguments]
    

    Since zbctl is based on the Go client, setting the appropriate environment variables will override these parameters.

    Troubleshooting authentication issues

    Here we will describe a few ways that the clients and gateway could be misconfigured and what those errors look like. Hopefully, this will help you recognize these situations and provide you with an easy fix.

    TLS is enabled in zbctl but disabled in the gateway

    The client will fail with the following error:

    Error: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: authentication handshake failed: tls: first record does not look like a TLS handshake"
    

    And the following error will be logged by Netty in the gateway:

    Aug 06, 2019 4:23:22 PM io.grpc.netty.NettyServerTransport notifyTerminated
    INFO: Transport failed
    io.netty.handler.codec.http2.Http2Exception: HTTP/2 client preface string missing or corrupt. Hex dump for received bytes: 1603010096010000920303d06091559c43ec48a18b50c028
      at io.netty.handler.codec.http2.Http2Exception.connectionError(Http2Exception.java:103)
      at io.netty.handler.codec.http2.Http2ConnectionHandler$PrefaceDecoder.readClientPrefaceString(Http2ConnectionHandler.java:306)
      at io.netty.handler.codec.http2.Http2ConnectionHandler$PrefaceDecoder.decode(Http2ConnectionHandler.java:239)
      at io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:438)
      at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:505)
      at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:444)
      at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:283)
      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
      at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
      at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1421)
      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
      at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:930)
      at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:794)
      at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:424)
      at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:326)
      at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918)
      at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
      at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
      at java.lang.Thread.run(Thread.java:748)
    

    Solution: Either enable TLS in the gateway as well or specify the --insecure flag when using zbctl.

    TLS is disabled in zbctl but enabled for the gateway

    zbctl will fail with the following error:

    Error: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection closed
    

    Solution: Either enable TLS in the client by specifying a path to a certificate or disable it in the gateway by editing the appropriate configuration file.

    TLS is enabled for both client and gateway but the CA certificate can't be found

    zbctl will fail with the following error:

    Error: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: authentication handshake failed: x509: certificate signed by unknown authority
    

    Solution: Either install the CA certificate in the appropriate location for the system or specify a path to certificate using the methods described above.

    Authorization

    Zeebe clients also provide a way for users to modify gRPC call headers, namely to contain access tokens. Note that the gateway doesn't provide any way of validating these headers, so users must implement a reverse proxy with a gRPC interceptor to validate them.

    Users can modify gRPC headers using Zeebe's built-in OAuthCredentialsProvider, which uses user-specified credentials to contact a OAuth authorization server. The authorization server should return an access token that is then appended to each gRPC request. Although, by default OAuthCredentialsProvider is configured with to use a Camunda Cloud authorization server, it can be configured to use any user-defined server. Users can also write a custom CredentialsProvider. In the following sections we will describe the CredentialsProvider interface as well as the built-in implementation.

    Credentials Provider

    As previously mentioned, the CredentialProvider's purpose is to modify the gRPC headers with an authorization method such that a reverse proxy sitting in front of the gateway can validate them. The interface consists of an applyCredentials method and a shouldRetryRequest method. The first method is called for each gRPC call and takes a map of headers to which it should add credentials. The second method is called whenever a gRPC call fails and takes in the error that caused the failure which is then used to decide whether or not the request should be retried. The following sections implement simple custom provider in Java and Go.

    Java

    public class MyCredentialsProvider implements CredentialsProvider {
        /**
         * Adds a token to the Authorization header of a gRPC call.
        */
        @Override
        public void applyCredentials(final Metadata headers) {
          final Key<String> authHeaderkey = Key.of("Authorization", Metadata.ASCII_STRING_MARSHALLER);
          headers.put(authHeaderKey, "Bearer someToken");
        }
    
        /**
        * Retries request if it failed with a timeout.
        */
        @Override
        public boolean shouldRetryRequest(final Throwable throwable) {
          return ((StatusRuntimeException) throwable).getStatus() == Status.DEADLINE_EXCEEDED;
        }
    }
    

    After implementing the CredentialsProvider, we can simply provide it when building a client:

    public class SecureClient {
        public static void main(final String[] args) {
          final ZeebeClient client = ZeebeClient.newClientBuilder().credentialsProvider(new MyCredentialsProvider()).build();
    
          // continue...
        }
    }
    

    Go

    package main
    
    import (
        "context"
        "fmt"
        "google.golang.org/grpc/status"
        "google.golang.org/grpc/codes"
        "github.com/zeebe-io/zeebe/clients/go/pkg/zbc"
    )
    
    type MyCredentialsProvider struct {
    }
    
    // ApplyCredentials adds a token to the Authorization header of a gRPC call.
    func (p *MyCredentialsProvider) ApplyCredentials(ctx context.Context, headers map[string]string) error {
        headers["Authorization"] = "someToken"
        return nil
    }
    
    // ShouldRetryRequest returns true if the call failed with a deadline exceed error.
    func (p *MyCredentialsProvider) ShouldRetryRequest(ctx context.Context, err error) bool {
        return status.Code(err) == codes.DeadlineExceeded
    }
    
    func main() {
        client, err := zbc.NewClient(&zbc.ClientConfig{
        	CredentialsProvider:  &MyCredentialsProvider{},
        })
        if err != nil {
          panic(err)
        }
    
        ctx := context.Background()
        response, err := client.NewTopologyCommand().Send(ctx)
        if err != nil {
          panic(err)
        }
    
        fmt.Println(response.String())
    }
    

    OAuthCredentialsProvider

    The OAuthCredentialsProvider requires the specification of a client ID and a client secret. These are then used to request an access token from an OAuth 2.0 authorization server through a client credentials flow. By default, the authorization server is the one used by Camunda Cloud but any other can be used. Using the access token returned by the authorization server, the OAuthCredentialsProvider will add it to the gRPC headers of each request as a bearer token. Requests which fail with an UNAUTHENTICATED gRPC code are seamlessly retried if and only if a new access token can be obtained.

    Java

    To use the Zeebe client with Camunda Cloud, first an OAuthCredentialsProvider has to be created and configured with the appropriate client credentials. The audience should be equivalent to the cluster endpoint without a port number.

    public class AuthorizedClient {
        public void main(String[] args) {
            final OAuthCredentialsProvider provider =
              new OAuthCredentialsProviderBuilder()
                  .clientId("clientId")
                  .clientSecret("clientSecret")
                  .audience("cluster.endpoint.com")
                  .build();
    
            final ZeebeClient client =
                new ZeebeClientBuilderImpl()
                    .gatewayAddress("cluster.endpoint.com:443")
                    .credentialsProvider(provider)
                    .build();
    
            System.out.println(client.newTopologyRequest().send().join().toString());
        }
    }
    

    For security reasons, client secrets should not be hardcoded. Therefore, the recommended way of passing client secrets into Zeebe is to use environment variables. Although several variables are supported, the ones required to set up a minimal client are ZEEBE_CLIENT_ID and ZEEBE_CLIENT_SECRET. After setting these variables to the correct values, the following would be equivalent to the previous code:

    public class AuthorizedClient {
        public void main(final String[] args) {
            final ZeebeClient client =
                new ZeebeClientBuilderImpl()
                    .gatewayAddress("cluster.endpoint.com:443")
                    .build();
    
            System.out.println(client.newTopologyRequest().send().join().toString());
        }
    }
    

    The client will create an OAuthCredentialProvider with the credentials specified through the environment variables and the audience will be extracted from the address specified through the ZeebeClientBuilder.

    Note: Zeebe's Java client will not prevent you from adding credentials to gRPC calls while using an insecure connection but you should be aware that doing so will expose your access token by transmiting it in plaintext.

    Go

    package main
    
    import (
        "context"
        "fmt"
        "github.com/zeebe-io/zeebe/clients/go/pkg/zbc"
    )
    
    func main() {
        credsProvider, err := zbc.NewOAuthCredentialsProvider(&zbc.OAuthProviderConfig{
            ClientID:     "clientId",
            ClientSecret: "clientSecret",
            Audience:     "cluster.endpoint.com",
        })
        if err != nil {
            panic(err)
        }
    
        client, err := zbc.NewClient(&zbc.ClientConfig{
            GatewayAddress:      "cluster.endpoint.com:443",
            CredentialsProvider: credsProvider,
        })
        if err != nil {
            panic(err)
        }
    
    
        ctx := context.Background()
        response, err := client.NewTopologyCommand().Send(ctx)
        if err != nil {
            panic(err)
        }
    
        fmt.Println(response.String())
    }
    

    As was the case with the Java client, it's possible to make use of the ZEEBE_CLIENT_ID and ZEEBE_CLIENT_SECRET environment variables to simplify the client configuration:

    package main
    
    import (
        "context"
        "fmt"
        "github.com/zeebe-io/zeebe/clients/go/pkg/zbc"
    )
    
    func main() {
        client, err := zbc.NewClient(&zbc.ClientConfig{
            GatewayAddress: "cluster.endpoint.com:443",
        })
        if err != nil {
            panic(err)
        }
    
        ctx := context.Background()
        response, err := client.NewTopologyCommand().Send(ctx)
        if err != nil {
            panic(err)
        }
    
        fmt.Println(response.String())
    }
    

    Note: Like the Java client, the Go client will not prevent you from adding credentials to gRPC calls while using an insecure connection but doing so will expose your access token.

    Environment Variables

    Since there are several environment variables that can be used to configure an OAuthCredentialsProvider, we list them here along with their uses:

    • ZEEBE_CLIENT_ID - the client ID used to request an access token from the authorization server
    • ZEEBE_CLIENT_SECRET - the client secret used to request an access token from the authorization server
    • ZEEBE_TOKEN_AUDIENCE - the address for which the token should be valid
    • ZEEBE_AUTHORIZATION_SERVER_URL - the URL of the authorization server from which the access token will be requested (by default, configured for Camunda Cloud)
    • ZEEBE_CLIENT_CONFIG_PATH - the path to a cache file where the access tokens will be stored (by default, it's $HOME/.camunda/credentials)

    Health Status

    Broker

    Zeebe broker exposes two http endpoints to query its health status.

    • Ready check
    • Health check

    Ready Check

    Ready check endpoint is exposed via http://{zeebe-broker}:{zeebe.broker.network.monitoringApi.port}/ready (by default port 9600). This endpoint return an empty 204 response. If it is not ready, it will return a 503 error.

    A broker is ready when it has installed all necessary services to start processing in all partitions. A broker is ready doesn't mean that it is the leader for the partitions. It only means that it is participating in the replication and can be either a leader or a follower of all the partitions that is assigned to it. Once it is ready it will never become unready again.

    A ready check is useful, for example, to use as a readinessProbe in a kubernetes configuration to control when a pod can be restarted for rolling upgrade. Depending on the cluster configuration, restarting one pod before the previous one is ready might make the system unavailable because the quorum of replicas is not available. By configuring a readinessProbe that uses the ready check endpoint we can inform Kubernetes when it is safe to proceed with rolling update.

    Health Check

    Health check endpoint is exposed via http://{zeebe-broker}:{zeebe.broker.network.monitoringApi.port}/health (by default port 9600). This endpoint return an empty 204 response if the broker is healthy. If it is not healthy, it will return a 503 error. A broker is never healthy before it is ready. Unlike ready check, a broker can become unhealthy after it is healthy. Hence it gives a better status of a running broker.

    A broker is healthy, when it can process workflows, accepts commands, and perform all its expected tasks. If it is unhealthy, then it can mean three things:

    • it is only temporarily unhealthy, e.g. due to environmental circumstances such as temporary I/O issues
    • it is partially unhealthy, could mean that one or more partitions is unhealthy, while the rest of them are able to process workflows
    • it is completely dead

    Metrics give more insight into which partition is healthy or unhealthy. When a broker becomes unhealthy, it is recommended to check the logs to see what went wrong.

    Gateway

    Zeebe gateway exposes three HTTP endpoints to query its health status:

    • Health Status - http://{zeebe-gateway}:9600/health
    • Startup Probe - http://{zeebe-gateway}:9600/actuator/health/startup
    • Liveness Probe - http://{zeebe-gateway}:9600/actuator/health/liveness

    (The default port can be changed in the configuration: {zeebe.gateway.monitoring.port})

    Health Status

    The gateway is healthy if it:

    • Started successfully
    • Has sufficient free memory and disk space to work with
    • Is able to respond to requests within a defined timeout
    • Is aware of other nodes in the cluster
    • Is aware of leaders for partitions

    Startup Probe

    The gateway is started if it finished its boot sequence successfully and is ready to receive requests. It is no longer started when it initiated the shutdown sequence.

    The started probe can be used as Kubernetes startup probe.

    Liveness Probe

    The gateway is live if it:

    • Started successfully
    • Has a minimal amount of free memory and disk space to work with
    • Is able to respond to requests within a defined timeout, or misses the timeout for less than 10 minutes
    • Is aware of other nodes in the cluster, or lost awareness of other nodes for less than 5 minutes
    • Is aware of leaders for partitions, or lost awareness of partition leaders for less than 5 minutes

    The liveness probe can be used as Kubernetes liveness probe.

    Status Responses

    Each endpoint returns a status which can be one of

    • UNKNWON (HTTP status code 200)
    • UP (HTTP status code 200)
    • DOWN (HTTP status code 503)
    • OUT_OF_SERVICE (HTTP status code 503)

    If details are enabled (default) the response will also contain additional details.

    Customization

    Health indicators are set to sensible defaults. For specific use cases, it might be necessary to customize health indicators.

    Backpressure

    When a broker receives a client request, it is written to the event stream first (see section Internal Processing for details), and processed later by the stream processor. If the processing is slow or if there are many client requests in the stream, it might take too long for the processor to start processing the command. If the broker keeps accepting new requests from the client, the back log increases and the processing latency can grow beyond an acceptable time. To avoid such problems, Zeebe employs a backpressure mechanism. When the broker receives more requests than it can process with an acceptable latency, it rejects some requests (see section Error handling).

    Terminologies

    • RTT - The time between the request is accepted by the broker and the response to the request is sent back to the gateway.
    • inflight count - The number of requests accepted by the broker but the response is not yet sent.
    • limit - maximum number of flight requests. When the inflight count is above the limit, any new incoming request will be rejected.

    Note that the limit and inflight count are calculated per partition.

    Backpressure algorithms

    Zeebe uses adaptive algorithms from concurrency-limits to dynamically calculate the limit. Zeebe can be configured with one of the following backpressure algorithms.

    Fixed Limit

    With “fixed limit” one can configure a fixed value of the limit. Zeebe operators are recommended to evaluate the latencies observed with different values for limit. Note that with different cluster configurations, you may have to choose different limit values.

    AIMD

    AIMD calculates the limit based on the configured requestTimeout. When the RTT for a request requestTimeout, the limit is increased by 1. When the RTT is longer than requestTimeout, the limit will be reduced according to the configured backoffRatio.

    Vegas

    Vegas is an adaptive limit algorithm based on TCP Vegas congestion control algorithm. Vegas estimates a base latency as the minimum observed latency. This base RTT is the expected latency when there is no load. Whenever the RTT deviates from the base RTT, a new limit is calculated based on the vegas algorithm. Vegas allows to configure two parameters - alpha and beta. The values correspond to a queue size that is estimated by the Vegas algorithm based on the observed RTT, base RTT, and current limit. When the queue size is below alpha, the limit is increased. When the queue size is above beta, the limit is decreased.

    Gradient

    Gradient is an adaptive limit algorithm that dynamically calculates the limit based on observed RTT. In the gradient algorithm, the limit is adjusted based on the gradient of observed RTT and an observed minimum RTT. If gradient is less than 1, the limit is decreased otherwise the limit is increased.

    Gradient2

    Gradient2 is similar to Gradient, but instead of using observed minimum RTT as the base, it uses and exponentially smoothed average RTT.

    Backpressure Tuning

    The goal of backpressure is to keep the processing latency low. The processing latency is calculated as the time between the command is written to the event stream until it is processed. Hence to see how backpressure behaves you can run a benchmark on your cluster and observe the following metrics.

    • zeebe_stream_processor_latency_bucket
    • zeebe_dropped_request_count_total
    • zeebe_received_request_count_total
    • zeebe_backpressure_requests_limit

    You may want to run the benchmark with different load

    1. With low load - where the number of requests send per second is low.
    2. With high load - where the number of requests sent per second is above what zeebe can process within a reasonable latency.

    If the value of the limit is small, the processing latency will be small but the number of rejected requests may be high. If the value of the limit is large, less requests may be rejected (depending on the request rate), but the processing latency may increase.

    When using "fixed limit", you can run the benchmark with different values for the limit. You can then determine a suitable value for a limit for which the processing latency (zeebe_stream_processor_latency_bucket) is within the desired latency.

    When using "AIMD", you can configure a requestTimeout which corresponds to a desired latency. Note that during high load "AIMD" can lead to a processing latency two times more than the configured requestTimeout. It is also recommended to configure a minLimit to prevent the limit from aggressively dropping during constant high load.

    When using "Vegas", you cannot configure the backpressure to a desired latency. Instead Vegas tries to keep the RTT as low as possible based on the observed minimum RTT.

    Similar to "Vegas", you cannot configure the desired latency in "Gradient" and "Gradient2". They calculated the limit based on the gradient of observed RTT from the expected RTT. Higher the value of rttTolerance, higher deviations are tolerated that results in higher values for limit.

    If a lot of requests are rejected due to backpressure, it might indicate that the processing capacity of the cluster is not enough to handle the expected throughput. If this is the expected workload, then you might consider a different configuration for the cluster such as provisioning more resources and, increasing the number of nodes and partitions.

    Disk space

    Zeebe uses the local disk for storage of it's persistent data. Therefore if the Zeebe broker runs out of disk space the system is in an invalid state, as the broker cannot update it's state.

    To prevent the system to end in an unrecoverable state Zeebe expects a minimum size of free disk space available. If this limit is violated the broker will reject new requests, to allow the operations team to free more disk space and the broker to continue to update it's state.

    Zeebe can be configured with the following settings for the disk usage watermarks:

    • zeebe.broker.data.diskUsageMonitoringEnabled: configure if disk usage should be monitored (default: true)
    • zeebe.broker.data.diskUsageReplicationWatermark: the fraction of used disk space before the replication is paused (default: 0.99)
    • zeebe.broker.data.diskUsageCommandWatermark: the fraction of used disk space before new user commands are rejected (default: 0.97), this has to be less then diskUsageReplicationWatermark
    • zeebe.broker.data.diskUsageMonitoringInterval: the interval in which the disk space usage is checked (default 1 second)

    For production use cases we recommend to set the values for diskUsageReplicationWatermark and diskUsageCommandWatermark to smaller values, for example diskUsageReplicationWatermark=0.9 and diskUsageCommandWatermark=0.8.

    Zeebe on Kubernetes

    Zeebe on K8s

    This section covers the fundamentals of how to run Zeebe in Kubernetes. There are several alternatives on how to deploy applications to a Kubernetes Cluster, but the following sections are using Helm charts to deploy a set of components into your cluster.

    Helm allows you to choose exactly what chart (set of components) do you want to install and how these components needs to be configured. These Helm Charts are continously being improved and released to the Official Zeebe Helm Chart Repository http://helm.zeebe.io

    You are free to choose your Kubernetes provider, our Helm charts are not cloud provider specific and we encourage reporting issues here if you find them.

    You can also join us in Slack at: https://zeebe-slack-invite.herokuapp.com/

    This document is divided in the following sections:

    Prerequisites

    In order to use Kubernetes you need to have the following tools installed in your local environment:

    • kubectl: Kubernetes Control CLI tool: installed and connected to your cluster
    • helm: Kubernetes Helm CLI tool

    You also need a Kubernetes Cluster, here you have several options:

    • Local for Development you can use: Kubernetes KIND, Minikube, MicroK8s
    • Remote: Google GKE, Azure AKS, Amazon EKS, etc.

    Notice that you can use free trials from different cloud providers to create Kubernetes Cluster to test Zeebe in the cloud.

    Optional tools related to Zeebe

    • Zeebe Modeler: to model/modify business processes Zeebe Modeler Releases
    • Zeebe CTL(zbctl): command line tool to interact with a Zeebe Cluster (local/remote). You can get the zbctl tool from the official Zeebe Release Page

    Zeebe Helm Charts

    Helm is a package manager for Kubernetes resources. Helm allows us to install a set of components by just referencing a package name and it allows us to override configurations to accomodate these packages to different scenarios. Helm also provide dependency management between charts, meaning that charts can depend on other charts allowing us to aggregate a set of components together that can be installed with a single command.

    As part of the Zeebe project, we are providing 3 Zeebe Helm Charts:

    Charts

    Initializing Helm in your Cluster

    You need to have kubectl already configured against a Kubernetes Cluster to install Helm (server side/tiller) into your cluster.

    Here you can download the helm-service-account-role.yaml file

    You also need to have the helm CLI tool installed as listed in the prerequisites section.

    > kubectl apply -f helm-service-account-role.yaml
    > helm init --service-account helm --upgrade 
    

    This install Helm server side components in your cluster and it will enable the helm cli tool to install Helm Charts into your environment.

    Add Zeebe Helm Repository

    The next step is to add the Zeebe official Helm Chart repository to your installation. Once this is done, Helm will be able to fetch and install Charts hostested in http://helm.zeebe.io.

    > helm repo add zeebe https://helm.zeebe.io
    > helm repo update
    

    Once this is done, we are ready to install any of the Helm Charts hosted in the official Zeebe Helm Chart repo.

    Install Zeebe Full Helm Chart (Zeebe Cluster + Operate + Ingress Controller)

    In this section we are going to install all the available Zeebe components inside a Kubernetes Cluster. Notice that this Kubernetes cluster can have already running services and Zeebe is going to installed just as another set of services.

    > helm install --name <RELEASE NAME> zeebe/zeebe-full
    

    Note: change with a name of your choice Notice that you can add the -n flag to specify in which Kubernetes namespace the components should be installed.

    Installing all the components in a cluster requires all the Docker images to be downloaded to the remote cluster, depending on which Cloud Provider you are using, the amount of time that it will take to fetch all the images will vary.

    If you are using Kubernetes KIND add -f kind-values.yaml

    The kind-values.yaml file can be downloaded here.

    > helm install --name <RELEASE NAME> zeebe/zeebe-full -f kind-values.yaml
    

    This will deploy the same components but with a set of parameters tailored to a local environment setup.

    Note that all the Docker images will be downloaded to your local KIND cluster, so it might take some time for the services to get started.

    You can check the progress of your deployment by checking if the Kubernetes PODs are up and running with:

    > kubectl get pods
    

    which returns something like:

    NAME                                                   READY   STATUS    RESTARTS   AGE
    elasticsearch-master-0                                 1/1     Running   0          4m6s
    elasticsearch-master-1                                 1/1     Running   0          4m6s
    elasticsearch-master-2                                 1/1     Running   0          4m6s
    <RELEASE NAME>-nginx-ingress-controller-5cf6dd7894-kc25s      1/1     Running   0          4m6s
    <RELEASE NAME>-nginx-ingress-default-backend-f5454db5-j9vh6   1/1     Running   0          4m6s
    <RELEASE NAME>-operate-5d4867d6d-h9zqw                        1/1     Running   0          4m6s
    <RELEASE NAME>-zeebe-0                                        1/1     Running   0          4m6s
    <RELEASE NAME>-zeebe-1                                        1/1     Running   0          4m6s
    <RELEASE NAME>-zeebe-2                                        1/1     Running   0          4m6s
    

    Check that each Pod has at least 1/1 running instances. You can always tail the logs of these pods by running:

    > kubectl logs -f <POD NAME> 
    

    In order to interact with the services inside the cluster you need to use port-forward to route traffic from your environment to the cluster.

    > kubectl port-forward svc/<RELEASE NAME>-zeebe 26500:26500
    

    Now you can connect and execute operations against your newly created Zeebe cluster.

    Notice that you need to keep port-forward running to be able to communicate with the remote cluster.

    Notice that accessing directly to the Zeebe Cluster using kubectl port-forward is recommended for development purposes. By default the Zeebe Helm Charts are not exposing the Zeebe Cluster via Ingress. If you want to uze zbctl or a local client/worker from outside the Kubernetes Cluster, you rely on kubectl port-forward to the Zeebe Cluster to communicate.

    Accessing Operate from outside the cluster

    The Zeebe Full Helm Charts install an Ingress Controller. If this is deployed in a cloud provider (GKE, EKS, AKS, etc.), it should provision a LoadBalancer which will expose an External IP that can be used as the main entry point to access all the services/applications that are configured to have Ingress Routes.

    If you have your own Ingress Controller, you can use the child chart for installing a Zeebe Cluster, instead of using the Parent Chart.

    You can find the External IP by running:

    > kubectl get svc
    

    You should see something like:

    NAME                                    TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)                                  AGE
    <RELEASE NAME>-nginx-ingress-controller        LoadBalancer   10.109.108.4     <pending>     80:30497/TCP,443:32232/TCP               63m
    

    Where the <pending> under the EXTERNAL-IP column should change to a public IP that you (and other users) should be able to access from outside the Cluster. You might need to check your Cloud Provider specific configuration if that doesn't work.

    Then you should be able to access Operate pointing your browser at http://

    If you are running in Kubernetes KIND, you will need to port-forward to the Ingress Controller main entry point due KIND doesn't support LoadBalancers. You can do that by running in a different terminal:

    > kubectl port-forward svc/<RELEASE NAME>-nginx-ingress-controller 8080:80
    

    Then you should be able to access Operate pointing your broswer at http://localhost:8080

    Operate Login

    Using demo/demo for credentials.

    Operate Login

    If you deploy Process Definitions, they will appear in the dashboard and then you can drill down to see your active instances. You can deploy and create new instances using the Zeebe Clients or zbctl.

    Zeebe Operator (Experimental)

    The Zeebe Kubernetes Operator was born out of the need to manage more than one single Zeebe Cluster running inside Kubernetes Clusters. Zeebe Clusters have their own lifecycle and in real implementations, the need to update, monitor and manage some of these cluster components while applications are running becomes challenging. The objective of the Zeebe k8s Operator is to simplify and natively integrate Zeebe with k8s, to solve operational burden and facilitate the creation and maintenance of a set of clusters.

    This operator has been built with Kubernetes Helm in mind, meaning that at the end of the day, this operator will be in charge of managing Helm Charts. If you are not familiar with Helm, Helm is a package manager for Kubernetes, which help us to package and distribute Kubernetes manifest. Helm also deals with installing, labeling and dependency management between packages (charts). Because we have Zeebe Helm packages already here: http://helm.zeebe.io which are automatically versioned and released, the Zeebe Kubernetes Operator will use these charts to create and manage new clusters and other related components.

    Because we are in Kubernetes realms we need to provide a declarative way of stating that we want a new Zeebe Cluster to be provisioned. For this reason, the ZeebeCluster Custom Resource Definition (CRD) is introduced. This resource contains all the information needed to provision a cluster and it will also reflect the current cluster status. The Zeebe Kubernetes Operator is built to monitor ZeebeCluster resources and interact with the Kubernetes APIs under the hood to make sure that the Zeebe Cluster is provisioned, upgraded or deleted correctly.

    Getting Started

    The Zeebe Kubernetes Operator can be installed using Helm, as it is provided as a Helm Chart as well. In contrast with zeebe-cluster , zeebe-operate and zeebe-full charts, the operator chart installation by itself doesn’t install any Zeebe Cluster, but allows you to do that by creating ZeebeCluster CRD resources.

    The following steps will guide you to install the Operator with Helm3 (which is the default version now)

    This will also work if you have correctly installed Helm2 in your cluster with tiller. Add the Zeebe Helm Repository:

    helm repo add zeebe https://helm.zeebe.io helm repo update

    Now you are ready to install the Zeebe Kubernetes Operator:

    helm install zeebe-operator zeebe/zeebe-operator

    Create my-zeebe-cluster.yaml

    apiVersion: zeebe.zeebe.io/v1
    kind: ZeebeCluster
    metadata:
      name: my-zeebe-cluster
    

    Create the resource within the Kubernetes cluster with:

    kubectl apply -f my-zeebe-cluster.yaml

    This will create a new namespace with the name stated in the ZeebeCluster resource ( ZeebeCluster.metadata.name ) and provision a new Zeebe Cluster plus ElasticSearch by default.

    Future versions will allow you to specify in the ZeebeCluster resource which ElasticSearch instance to use.

    Notice that the first time provisioning a cluster, docker images will be downloaded to the Kubernetes Docker Registry so the first cluster might take more time to be provisioned.

    You can now query for your Zeebe Clusters using the kubectl CLI:

    kubectl get zb

    If you delete the ZeebeCluster resource the actual ZeebeCluster will be automatically removed from your cluster. Now you can check that there is a new “Namespace” created with:

    kubectl get ns

    And also check that the cluster is correctly provisioned by looking at the Pods created inside the newly created namespace with

    kubectl get pods -n -w

    The next video show these commands in action along with the installation of the Zeebe Kubernetes Operator:

    Intro video

    Technical Details and Dependencies

    This Kubernetes Operator was built using KubeBuilder V2.1+, Tekton 0.8.0+ and Helm 3.

    The Operator Defines currently 1 CRD (Custom Resource Definition): ZeebeCluster, but in future versions, new types will be defined for other components such as Zeebe Operate and Workers. The ZeebeCluster resource represent a low-level resource which will instantiate a Zeebe Cluster based on predefined parameters. This low-level resource definition can be used to define the cluster topology and HA configurations.

    The Zeebe Kubernetes Operator was built using the kubebuilder framework for writing the controller’s logic and scaffolding the CRD type. Internally it does interact with Tekton Pipelines in order to install and manage Zeebe Helm Charts. The project itself is being built, released and tested using Jenkins X. This leads to some changes in how KubeBuilder’s project is structured, as in its current shape the project is not organised in a way that is easy to create a Helm Chart out of it.

    The main flow of the Operator works like this: Flow

    First, the Operator will be constantly looking for ZeebeCluster resources. When one is found a new Namespace is created and a Tekton Task and TaskRun are created to “upgrade” the Helm Charts defined inside the Version Stream repository (hosted here: https://github.com/zeebe-io/zeebe-version-stream-helm ).

    This repository (referred as Version Stream Repository) contains a list of blessed versions that will be installed when a new ZeebeCluster resource is detected by the operator. Using a Version Stream Repository provides us with the flexibility to evolve the operator code and the charts that define what needs to be provisioned independently. This allows for a simple upgrade path to future versions by using a Git repository as central reference to a stable version.

    In future versions, the option to choose a version stream repository will be provided, allowing different streams.

    The Task created in Tekton Pipelines execute two basic operations:

    • First, clone Version Stream Repository (using simple git clone) Run Helm Upgrade of the chart defined in the Version Stream Repository (it will automatically upgrade/install if it doesn’t exist)

    • Then, running Helm upgrade/install will create a Helm Release which can be upgraded if new versions of the charts are available. These releases can be queried using the Helm cli tool: helm list --all-namespaces.

    Once the Task is created an execution is triggered by the creation of a TaskRun (an actual instance of the task) and the operator will monitor for this task to be completed. Once the task is completed, the Operator watches for the Zeebe Cluster to be provisioned. In a more detailed look, the Operator will look for a StatefulSet (Zeebe Broker Nodes) with a set of labels matching the ZeebeCluster name, inside the created namespace.

    Once the StatefulSet is located, the Operator assigns the ZeebeCluster resource as the Owner of this StatefulSet, hence it will be notified about the changes emitted by the resources associated to the StatefulSet. This allows the Operator to externalise a Health Status of the Zeebe Cluster at any given point, understanding the actual state of the cluster itself.

    Operate User Guide

    Operate is a tool for monitoring and troubleshooting workflow instances running in Zeebe.

    In addition to providing visibility into active and completed workflow instances, Operate also makes it possible to carry out key operations such as resolving incidents and updating workflow instance variables.

    In the Getting Started tutorial, we walk through how to install and run Operate and how to use it to monitor workflow instances. In this Operate User Guide, we’ll cover some of Operate’s more advanced features.

    Because Operate can be a helpful tool when getting started with Zeebe and building an initial proof of concept, we make it available under a developer license for free, non-production use. You can find the developer license here. There are no restrictions under this license when it comes to the length of the evaluation period or the available feature set as long as you use Operate in non-production environments only.

    Operate is also available for production use (with support) in the Camunda Cloud offering. If you'd like try out Operate in the Camunda Cloud, please sign up here.

    Install & Start Operate

    Running via Docker (Local Development)

    The easiest way to run Operate in development is with Docker. This gives you a consistent, reproducible environment and an out-of-the-box integrated experience for experimenting with Zeebe and Operate.

    To do this, you need Docker Desktop installed on your development machine.

    The zeebe-docker-compose repository contains an operate profile that starts a single Zeebe broker with Operate and all its dependencies. See the README file in the repository for instructions to start Zeebe and Operate using Docker.

    If you are using Docker, once you follow the instructions in the repository, skip ahead to the section "Access the Operate Web Interface”.

    Running with Kubernetes (Production)

    We will update this section after Operate is available for production use.

    Running Operate with Kubernetes will be recommended for production deployments.

    Manual Configuration (Local Development)

    Here, we’ll walk you through how to download and run an Operate distribution manually, without using Docker.

    Note that the Operate web UI is available by default at http://localhost:8080, so please be sure this port is available.

    Download Operate and a compatible version of Zeebe.

    Operate and Zeebe distributions are available for download on the same release page.

    Note that each version of Operate is compatible with a specific version of Zeebe. For example: Zeebe 0.18.0 and Operate-1.0.0-alpha11 are compatible.

    On the Zeebe release page, compatible versions of Zeebe and Operate are grouped together. Please be sure to download and use compatible versions. This is handled for you if you use the Docker profile from our repository.

    Download Elasticsearch

    Operate uses open-source Elasticsearch as its underlying data store, and so to run Operate, you need to download and run Elasticsearch.

    Operate is currently compatible Elasticsearch 6.8.1. You can download Elasticsearch here.

    Run Elasticsearch

    To run Elasticsearch, execute the following commands in Terminal or another command line tool of your choice:

    cd elasticsearch-*
    bin/elasticearch
    

    You’ll know Elasticsearch has started successfully when you see a message similar to:

    [INFO ][o.e.l.LicenseService     ] [-IbqP-o] license [72038058-e8ae-4c71-81a1-e9727f2b81c7] mode [basic] - valid
    

    Run Zeebe

    To run Zeebe, execute the following commands:

    cd zeebe-broker-*
    ./bin/broker
    

    You’ll know Zeebe has started successfully when you see a message similar to:

    [partition-0] [0.0.0.0:26501-zb-actors-0] INFO  io.zeebe.raft - Joined raft in term 0
    [exporter] [0.0.0.0:26501-zb-actors-1] INFO  io.zeebe.broker.exporter.elasticsearch - Exporter opened
    

    Run Operate

    To run Operate, execute the following commands:

    `cd camunda-operate-distro-1.0.0-*``

    bin/operate

    You’ll know Operate has started successfully when you see messages similar to:

    DEBUG 1416 --- [       Thread-6] o.c.o.e.w.BatchOperationWriter           : 0 operations locked
    DEBUG 1416 --- [       Thread-4] o.c.o.z.ZeebeESImporter                  : Latest loaded position for alias [zeebe-record-deployment] and partitionId [0]: 0
    INFO 1416 --- [       Thread-4] o.c.o.z.ZeebeESImporter                  : Elasticsearch index for ValueType DEPLOYMENT was not found, alias zeebe-record-deployment. Skipping.
    

    Access the Operate Web Interface

    The Operate web interface is available at http://localhost:8080.

    The first screen you'll see is a sign-in page. Use the credentials demo / demo to sign in.

    After you sign in, you'll see an empty dashboard if you haven't yet deployed any workflows:

    operate-dash-no-workflows

    If you have deployed workflows or created workflow instances, you'll see those on your dashboard:

    operate-dash-with-workflows

    Getting Familiar With Operate

    This section "Getting Familiar With Operate" and the next section “Incidents and Payloads” assumes that you’ve deployed a workflow to Zeebe and have created at least one workflow instance.

    If you’re not sure how to deploy workflows or create instances, we recommend going through the Getting Started tutorial.

    In the following sections, we’ll use the same order-process.bpmn workflow model from the Getting Started guide.

    View A Deployed Workflow

    In the “Instances by Workflow” panel in your dashboard, you should see a list of your deployed workflows and running instances.

    operate-dash-with-workflows

    When you click on the name of a deployed workflow in the “Instances by Workflow” panel, you’ll navigate to a view of that workflow model along with all running instances.

    operate-view-workflow

    From this “Running Instances” view, you have the ability to cancel a single running workflow instance.

    operate-cancel-workflow-instance

    Inspect A Workflow Instance

    Running workflow instances appear in the “Instances” section below the workflow model. To inspect a specific instance, you can click on the instance ID.

    operate-inspect-instance

    There, you’ll be able to see detail about the workflow instance, including the instance history and the variables attached to the instance.

    operate-view-instance-detail

    Update Variables & Resolve Incidents

    Every workflow instance created for the workflow model used in the Getting Started tutorial requires an orderValue so that the XOR gateway evaluation will happen properly.

    Let’s look at a case where orderValue is present but was set as a string, but our order-process.bpmn model requires an integer in order to properly evaluate the orderValue and route the instance.

    Linux

    ./bin/zbctl --insecure create instance order-process --variables '{"orderId": "1234", "orderValue":"99"}'
    

    Mac

    ./bin/zbctl.darwin --insecure create instance order-process --variables '{"orderId": "1234", "orderValue":"99"}'
    

    Windows (Powershell)

    ./bin/zbctl.exe --insecure create instance order-process --variables '{\"orderId\": \"1234\", \
    "orderValue\": \"99\"}'
    

    To advance the instance to our XOR gateway, we’ll quickly create a job worker to complete the “Initiate Payment” task:

    Linux

    ./bin/zbctl --insecure create worker initiate-payment --handler cat
    

    Mac

    ./bin/zbctl.darwin --insecure create worker initiate-payment --handler cat
    

    Windows (Powershell)

    ./bin/zbctl.exe --insecure create worker initiate-payment --handler "findstr .*"
    

    And we’ll publish a message that will be correlated with the instance so we can advance past the “Payment Received” Intermediate Message Catch Event:

    Linux

    ./bin/zbctl --insecure publish message "payment-received" --correlationKey="1234"
    

    Mac

    ./bin/zbctl.darwin --insecure publish message "payment-received" --correlationKey="1234"
    

    Windows (Powershell)

    ./bin/zbctl.exe --insecure publish message "payment-received" --correlationKey="1234"
    

    In the Operate interface, you should now see that the workflow instance has an “Incident”, which means there’s a problem with workflow execution that needs to be fixed before the workflow instance can progress to the next step.

    operate-incident-workflow-view

    Operate provides tools for diagnosing and resolving incidents. Let’s go through incident diagnosis and resolution step-by-step.

    When we inspect the workflow instance, we can see exactly what our incident is: Expected to evaluate condition 'orderValue>=100' successfully, but failed because: Cannot compare values of different types: STRING and INTEGER

    operate-incident-instance-view

    We have enough information to know that to resolve this incident, we need to edit the orderValue variable so that it’s an integer. To do so, first click on the edit icon next to the variable you’d like to edit.

    operate-incident-edit-variable

    Next, edit the variable by removing the quotation marks from the orderValue value. Then click on the checkmark icon to save the change.

    We were able to solve this particular problem by editing a variable, but it’s worth noting that you can also add a variable if a variable is missing from a workflow instance altogether.

    operate-incident-save-variable-edit

    There’s one last step we need to take: initiating a “retry” of the workflow instance. There are two places on the workflow instance page where you can initiate a retry:

    operate-retry-instance

    You should now see that the incident has been resolved, and the workflow instance has progressed to the next step. Well done!

    operate-incident-resolved-instance-view

    If you’d like to complete the workflow instance, you can create a worker for the “Ship Without Insurance” task:

    Linux

    ./bin/zbctl --insecure create worker ship-without-insurance --handler cat
    

    Mac

    ./bin/zbctl.darwin --insecure create worker ship-without-insurance --handler cat
    

    Windows (Powershell)

    ./bin/zbctl.exe --insecure create worker ship-without-insurance --handler "findstr .*"
    

    Selections & Batch Operations

    In some cases, you’ll need to retry or cancel many workflow instances at once. Operate also supports this type of batch operation.

    Imagine a case where many workflow instances have an incident incident caused by the same issue. At some point, the underlying problem will have been resolved (for example, maybe a microservice was down for an extended period of time then was brought back up).

    But even though the underlying problem was resolved, the affected workflow instances are stuck until they’re “retried”.

    operate-batch-retry

    Let's create a selection in Operate. A selection is simply a set of workflow instances on which you can carry out a batch retry or batch cancellation. To create a selection, check the box next to the workflow instances you'd like to include, then click on the blue “Create Selection” button.

    operate-batch-retry

    Your selection will appear in the right-side Selections panel.

    operate-batch-retry

    You can retry or cancel the workflow instances in the selection immediately, or you can come back to the selection later–your selection will remain saved. And you can add more workflow instances to the selection after it was initially created via the blue “Add To Selection” button.

    operate-batch-retry

    When you’re ready to carry out an operation, you can simply return to the selection and retry or cancel the workflow instances as a batch.

    operate-batch-retry

    If the operation was successful, the state of the workflow instances will be updated to active and without incident.

    operate-batch-retry

    Giving Feedback And Asking Questions

    If you have questions or feedback about Operate, the Zeebe user forum is the best way to get in touch with us. Please use the “Operate” tag when you create a post.

    If you’re interested in an Operate enterprise license so that you can use Operate in production, please contact us so that we can notify you when an enterprise edition of Operate becomes available.

    Did you find a problem in Operate? Or do you have a suggestion for an improvement? You can create an issue in the Operate JIRA project to let us know.

    Operate Deployment Guide

    To get started with Operate please follow the Operate User Guide.

    This section describes in more details, how Operate can be configured and scaled.

    Introduction

    Operate is a Spring Boot application. That means all ways to configure a Spring Boot application can be applied. By default the configuration for Operate is stored in a YAML file application.yml. All Operate related settings are prefixed with camunda.operate. The following parts are configurable:

    Configurations

    Elasticsearch

    Operate stores and reads data in/from Elasticsearch.

    Settings to connect

    Operate supports basic authentication for elasticsearch. Set the appropriate username/password combination in the configuration to use it.

    Either set host and port (deprecated) or url (recommended)

    NameDescriptionDefault value
    camunda.operate.elasticsearch.clusterNameClustername of Elasticsearchelasticsearch
    camunda.operate.elasticsearch.hostHostname where Elasticsearch is runninglocalhost
    camunda.operate.elasticsearch.portPort of Elasticsearch REST API9200
    camunda.operate.elasticsearch.urlURL of Elasticsearch REST APIhttp://localhost:9200
    camunda.operate.elasticsearch.usernameUsername to access Elasticsearch REST API-
    camunda.operate.elasticsearch.passwordPassword to access Elasticsearch REST API-

    A snippet from application.yml:

    camunda.operate:
      elasticsearch:
        # Cluster name
        clusterName: elasticsearch
        # Url
        url: http://localhost:9200
    

    Zeebe Broker Connection

    Operate needs a connection to Zeebe Broker to start the import and to execute user operations.

    Settings to connect

    NameDescriptionDefault value
    camunda.operate.zeebe.brokerContactPointBroker contact point to zeebe as hostname and portlocalhost:26500

    A snippet from application.yml:

    camunda.operate:
      zeebe:
        # Broker contact point
        brokerContactPoint: localhost:26500
    

    Zeebe Elasticsearch Exporter

    Operate imports data from Elasticsearch indices created and filled in by Zeebe Elasticsearch Exporter. Therefore settings for this Elasticsearch connection must be defined and must correspond to the settings on Zeebe side.

    Settings to connect and import:

    Either set host and port (deprecated) or url (recommended)

    NameDescriptionDefault value
    camunda.operate.zeebeElasticsearch.clusterNameCluster name of Elasticsearchelasticsearch
    camunda.operate.zeebeElasticsearch.hostHostname where Elasticsearch is runninglocalhost
    camunda.operate.zeebeElasticsearch.portPort of Elasticsearch REST API9200
    camunda.operate.zeebeElasticsearch.urlURL of Elasticsearch REST APIhttp://localhost:9200
    camunda.operate.zeebeElasticsearch.prefixIndex prefix as configured in Zeebe Elasticsearch exporterzeebe-record
    camunda.operate.zeebeElasticsearch.usernameUsername to access Elasticsearch REST API-
    camunda.operate.zeebeElasticsearch.passwordPassword to access Elasticsearch REST API-

    A snippet from application.yml:

    camunda.operate:
      zeebeElasticsearch:
        # Cluster name
        clusterName: elasticsearch
        # Url
        url: http://localhost:9200
        # Index prefix, configured in Zeebe Elasticsearch exporter
        prefix: zeebe-record
    

    Operation Executor

    Operations are user operations like Cancellation of workflow instance(s) or Updating the variable value. Operations are executed in multi-threaded manner.

    NameDescriptionDefault value
    camunda.operate.operationExecutor.threadsCountHow many threads should be used3

    A snippet from application.yml

    camunda.operate:
      operationExecutor:
      	threadsCount: 3
    

    Monitoring Operate

    Operate includes Spring Boot Actuator inside, that provides number of monitoring possibilities.

    Operate uses following Actuator configuration by default:

    # enable health check and metrics endpoints
    management.endpoints.web.exposure.include: health,prometheus
    # enable Kubernetes health groups:
    # https://docs.spring.io/spring-boot/docs/current/reference/html/production-ready-features.html#production-ready-kubernetes-probes
    management.health.probes.enabled: true
    

    With this configuration following endpoints are available for use out of the box:

    <server>:8080/actuator/prometheus Prometheus metrics

    <server>:8080/actuator/health/liveness Liveness probe

    <server>:8080/actuator/health/readiness Readiness probe

    Versions before 0.25.0

    In versions before 0.25.0 management endpoints look differently, therefore we recommend to reconfigure for next versions.

    NameBefore 0.25.0Starting with 0.25.0
    Readiness/api/check/actuator/health/readiness
    Liveness/actuator/health/actuator/health/liveness

    Logging

    Operate uses Log4j2 framework for logging. In distribution archive as well as inside a Docker image config/log4j2.xml logging configuration files is included, that can be further adjusted to your needs:

    <?xml version="1.0" encoding="UTF-8"?>
    <Configuration status="WARN" monitorInterval="30">
      <Properties>
        <Property name="LOG_PATTERN">%clr{%d{yyyy-MM-dd HH:mm:ss.SSS}}{faint} %clr{%5p} %clr{${sys:PID}}{magenta} %clr{---}{faint} %clr{[%15.15t]}{faint} %clr{%-40.40c{1.}}{cyan} %clr{:}{faint} %m%n%xwEx</Property>
      </Properties>
      <Appenders>
        <Console name="Console" target="SYSTEM_OUT" follow="true">
          <PatternLayout pattern="${LOG_PATTERN}"/>
        </Console>
    	<Console name="Stackdriver" target="SYSTEM_OUT" follow="true">
          <StackdriverJSONLayout/>
        </Console>
      </Appenders>
      <Loggers>
        <Logger name="org.camunda.operate" level="info" />
        <Root level="info">
          <AppenderRef ref="${env:OPERATE_LOG_APPENDER:-Console}"/>
        </Root>
      </Loggers>
    </Configuration>
    

    By default Console log appender will be used.

    JSON logging configuration

    You can choose to output logs in JSON format (Stackdriver compatible). To enable it, define the environment variable OPERATE_LOG_APPENDER like this:

    OPERATE_LOG_APPENDER=Stackdriver
    

    An example of application.yml file

    The following snippet represents the default Operate configuration, which is shipped with the distribution. It can be found inside the config folder (config/application.yml) and can be used to adjust Operate to your needs.

    # Operate configuration file
    
    camunda.operate:
      # Set operate username and password.
      # If user with <username> does not exists it will be created.
      # Default: demo/demo
      #username:
      #password:
      # ELS instance to store Operate data
      elasticsearch:
        # Cluster name
        clusterName: elasticsearch
        # Url
        url: http://localhost:9200
      # Zeebe instance
      zeebe:
        # Broker contact point
        brokerContactPoint: localhost:26500
      # ELS instance to export Zeebe data to
      zeebeElasticsearch:
        # Cluster name
        clusterName: elasticsearch
        # Url
        url: http://localhost:9200
        # Index prefix, configured in Zeebe Elasticsearch exporter
        prefix: zeebe-record
    

    Data Retention

    How the data is stored and archived

    Operate imports data from Zeebe and stores it in Elasticsearch indices with defined prefix (default: operate). Specifically:

    • deployed workflows, including the diagrams
    • the state of workflow instances, including variables, flow nodes, that were activated within instance execution, incidents etc.

    It additionally stores some Operate specific data:

    • operations performed by the user
    • list of users
    • technical data, like the state of Zeebe import etc.

    The data that represents workflow instance state becomes immutable after workflow instance is finished. At this moment the data may be archived, meaning that it will be moved to a dated index, e.g. operate_variables_2020-01-01, where date represents the date on which given workflow instance was finished. The same is valid for user operations: after they are finished the related data is moved to dated indices.

    Note: All Operate data present in Elasticsearch (from both "main" and dated indices) will be visible from the UI.

    Data cleanup

    In case of intensive Zeebe usage the amount of data can grow significantly with the time, therefore you should think about the data cleanup strategy. Dated indices may be safely removed from Elasticsearch. "Safely" means here, that only finished workflow instances will be deleted together with all related data, and the rest or the data will stay consistent. You can use Elasticsearch Curator or other tools/scripts to delete old data.

    Attention: Only indices that contain dates in their suffix may be deleted.

    Schema & Migration

    Operate stores data in Elasticsearch. On first start Operate will create all required indices and templates.

    Schema

    Operate uses several Elasticsearch indices that are mostly created by using templates.

    Index names follow the defined pattern:

    operate-{datatype}-{schemaversion}_[{date}]
    
    

    , where datatype defines which data is stored in the index, e.g. user, variable etc., schemaversion represents version of Operate, date represents finished date of archived data (see Data retention).

    Knowing index name pattern, it is possible to customize index settings by creating Elasticsearch templates (Example of an index template) E.g. to define desired number of shards and replicas, you can define following template:

    PUT _template/template_operate
    {
      "index_patterns": ["operate-*"],
      "settings": {
        "number_of_shards": 5,
        "number_of_replicas": 2
      }
    }
    

    Note: In order for these settings to work, template must be created before Operate first run.

    Data migration

    Version of Operate is reflected in Elasticsearch object names, e.g. operate-user-0.24.0_ index contains user data for Operate 0.24.0. When upgrading from one version of Operate to another, migration of data must be performed. Operate distribution provides an application to perform data migration from previous versions.

    Concept

    The migration uses Elasticsearch processors and pipelines to reindex the data.

    Each version of Operate delivers set of migration steps needed to be applied for corresponding version of Operate. When upgrading from one version to another necessary migration steps constitute the so-called migration plan. All known migration steps (both applied and not) are persisted in dedicated Elasticsearch index: operate-migration-steps-repository.

    How to migrate

    Migrate by using standalone application

    Make sure that Elasticsearch which contains the Operate data is running. The migration script will connect to specified connection in Operate configuration (<operate_home>/config/application.yml).

    Execute <operate_home>/bin/migrate (or <operate_home>/bin/migrate.bat for Windows).

    What is expected to happen:

    • Elasticsearch schema of new version is created
    • previous version is detected
    • migration plan is built and executed reindexing data for each index
    • old indices are deleted

    All known migration steps with metadata will be stored in operate-migration-steps-repository index.

    Note: The old indices will be deleted ONLY after successful migration. That might require more disk space during migration process.

    Important! You should take care of data backup before performing migration.

    Migrate by using built-in automatic upgrade

    When running newer version of Operate against older schema, it will perform data migration on startup. The migration will happen when exactly ONE previous schema version was detected.

    Further notes

    • If migration fails, it is OK to retry it. All applied steps are stored and only those steps will be applied that hasn't been executed yet.
    • Operate should not be running, while migration is happening
    • In case version upgrade is performed in the cluster with several Operate nodes, only one node (Webapp module) must execute data migration, the others must be stopped and started only after migration is fully finished

    Configure migration

    Automatic migration is enabled by default. It can be disabled by setting the configuration key:

    camunda.operate.migration.migrationEnabled = false

    You can specify previous ("source") version with the configuration key:

    camunda.operate.migration.sourceVersion=0.23.0

    If no sourceVersion is defined Operate tries to detect it from Elasticsearch indices.

    Example for migrate in Kubernetes

    To ensure that the migration will be executed before Operate will be started you can use the init container feature of Kubernetes. It makes sure that the 'main' container will only be started if the initContainer was successfully executed. The following snippet of a pod description for Kubernetes shows the usage of migrate script as initContainer.

    ...
      labels:
        app: operate
    spec:
       initContainers:
         - name: migration
           image: camunda/operate:0.24.0
           command: ['/bin/sh','/usr/local/operate/bin/migrate']
       containers:
         - name: operate
           image: camunda/operate:0.24.0
           env:
    ...
    

    Importer & Archiver. Scaling Operate

    Operate consists of three modules:

    • Webapp - contains the UI and operation executor functionality
    • Importer - is responsible for importing data from Zeebe
    • Archiver - is responsible for archiving "old" data (finished workflow instances and user operations) (see Data retention).

    Modules can be run together or separately in any combination and can be scaled. When you run Operate instance, by default, all modules are enabled. To disable them you can use following configuration parameters:

    Configuration parameterDescriptionDefault value
    camunda.operate.importerEnabledWhen true, Importer module is enabledtrue
    camunda.operate.archiverEnabledWhen true, Archiver module is enabledtrue
    camunda.operate.webappEnabledWhen true, Webapp module is enabledtrue

    Additionally you can have several importer and archiver nodes to increase throughput. Internally they will spread their work based on Zeebe partitions.

    E.g. if your Zeebe runs 10 partitions and you configure 2 importer nodes, they will import data from 5 partitions each. Each single importer/archiver node must be configured with the use of following configuration parameters:

    Configuration parameterDescriptionDefault value
    camunda.operate.clusterNode.partitionIdsArray of Zeebe partition ids, this Importer (or Archiver) node must be responsible forempty array, meaning all partitions data is loaded
    camunda.operate.clusterNode.nodeCountTotal amount of Importer (or Archiver) nodes in the cluster1
    camunda.operate.clusterNode.currentNodeIdId of current Importer (or Archiver) node, starting from 00

    It's enough to configure either partitionIds or pair of nodeCount and currentNodeId. In case you provide nodeCount and currentNodeId, each node will automatically guess Zeebe partitions it is responsible for.

    Note nodeCount always represents the number of nodes of one specific type.

    E.g. configuration of the cluster with 1 Webapp node, 2 Importer nodes and 1 Archiver node could look like this:

    Webapp node
    
    camunda.operate:
      archiverEnabled: false
      importerEnabled: false
      #other configuration...
    
    Importer node #1
    
    camunda.operate:
      archiverEnabled: false
      webappEnabled: false
      clusterNode:
        nodeCount: 2
        currentNodeId: 0
      #other configuration...
      
    Importer node #2
    
    camunda.operate:
      archiverEnabled: false
      webappEnabled: false
      clusterNode:
        nodeCount: 2
        currentNodeId: 1
      #other configuration...
      
    Archiver node
    
    camunda.operate:
      webappEnabled: false
      importerEnabled: false
      
    

    You can further parallelize archiver and(or) importer within one node by using following configuration parameters:

    Configuration parameterDescriptionDefault value
    camunda.operate.archiver.threadsCountNumber of threads, in which data will be archived1
    camunda.operate.importer.threadsCountNumber of threads, in which data will be importe3

    Note Parallelization of import and archiving within one node will also happen based on Zeebe partitions, meaning that only configurations with (number of nodes) * (threadsCount) <= (total number of Zeebe partitions) will make sense. Too many threads and nodes will still work, but some of them will be idle.

    Authentication

    Introduction

    Operate provides three ways for authentication:

    1. Authenticate with user information stored in Elasticsearch
    2. Authenticate via Auth0 Single Sign-On provider
    3. Authenticate via Lightweight Directory Access Protocol (LDAP)

    By default user storage in Elasticsearch is enabled.

    User in Elasticsearch

    In this mode the user authenticates with username and password, that are stored in Elasticsearch. username and password for one user may be set in application.yml:

    camunda.operate:
      username: anUser
      password: aPassword
    

    On Operate startup the user will be created if not existed before.

    By default one user with username/password demo/demo will be created.

    More users can be added directly to Elasticsearch, to the index operate-user-<version>_. Password must be encoded with BCrypt strong hashing function.

    Auth0 Single Sign-On

    Currently Operate supports Auth0.com implementation of Single Sign-On.

    Enable Single Sign-On

    Single Sign-On may be enabled only by setting Spring profile: sso-auth

    Example for setting spring profile as environmental variable:

    export SPRING_PROFILES_ACTIVE=sso-auth
    

    Configure Single Sign-On

    Single Sign-On needs following parameters (all are mandatory):

    ParameternameDescription
    camunda.operate.auth0.domainDefines the domain which the user sees
    camunda.operate.auth0.backendDomainDefines the domain which provides user information
    camunda.operate.auth0.clientIdIt's like an user name for the application
    camunda.operate.auth0.clientSecretIt's like a password for the application
    camunda.operate.auth0.claimNameThe claim that will be checked by Operate. It's like a permission name
    camunda.operate.auth0.organizationThe given organization should be contained in value of claim name

    Example for setting parameters as environment variables:

    export CAMUNDA_OPERATE_AUTH0_DOMAIN=A_DOMAIN
    export CAMUNDA_OPERATE_AUTH0_BACKENDDOMAIN=A_BACKEND_DDOMAIN
    export CAMUNDA_OPERATE_AUTH0_CLIENTID=A_CLIENT_ID
    export CAMUNDA_OPERATE_AUTH0_CLIENTSECRET=A_SECRET
    export CAMUNDA_OPERATE_AUTH0_CLAIMNAME=A_CLAIM
    export CAMUNDA_OPERATE_AUTH0_ORGANIZATION=AN_ORGANIZATION
    

    LDAP

    Enable LDAP

    LDAP can be enabled only by setting Spring profile: ldap-auth

    Example for setting spring profile as environmental variable:

    export SPRING_PROFILES_ACTIVE=ldap-auth
    

    Configuration of LDAP

    A user can authenticate via LDAP. Following parameters for a connection to a LDAP server should be given:

    ParameternameDescriptionExampleRequired
    camunda.operate.ldap.urlURL to a LDAP Serverldaps://camunda.com/yes
    camunda.operate.ldap.baseDnBase domain namedc=camunda,dc=comyes
    camunda.operate.ldap.managerDnManager domain, is used by Operate to login into LDAP Server to retrieve user informationscn=admin,dc=camunda,dc=comyes
    camunda.operate.ldap.managerPasswordPassword for manageryes
    camunda.operate.ldap.userSearchFilterFilter to retrieve user info, The pattern '{0}' will be replaced by given username in login form{0}no, Default is {0}
    camunda.operate.ldap.userSearchBaseStarting point for searchou=Support,dc=camunda,dc=comno

    Configuration of Active Directory based LDAP

    For Active Directory based LDAP server following parameters should be given:

    Note: Only when camunda.operate.ldap.domain is given, the Active Directory configuration will be applied.

    ParameternameDescriptionRequired
    camunda.operate.ldap.urlURL to a Active Directory LDAP Serveryes
    camunda.operate.ldap.domainDomainyes
    camunda.operate.ldap.baseDnRoot domain nameno
    camunda.operate.ldap.userSearchFilterIs used as search filterno

    Tasklist User Guide

    Tasklist is a tool for ... workflow instances running in Zeebe.

    In the Getting Started tutorial, we walk through how to install and run Tasklist and how to use it. In this Tasklist User Guide, we’ll cover some of Tasklist’s more advanced features.

    Tasklist is also available for production use (with support) in the Camunda Cloud offering. If you'd like try out Tasklist in the Camunda Cloud, please sign up here.

    Install & Start Tasklist

    Getting Familiar With Tasklist

    Tasklist Deployment Guide

    To get started with Tasklist please follow the Tasklist User Guide.

    This section describes in more details, how Tasklist can be configured.

    Introduction

    Tasklist is a Spring Boot application. That means all ways to configure a Spring Boot application can be applied. By default, the configuration for Tasklist is stored in a YAML file application.yml. All Tasklist related settings are prefixed with zeebe.tasklist. The following parts are configurable:

    Configurations

    Elasticsearch

    Tasklist stores and reads data in/from Elasticsearch.

    Settings to connect

    Tasklist supports basic authentication for elasticsearch. Set the appropriate username/password combination in the configuration to use it.

    NameDescriptionDefault value
    zeebe.tasklist.elasticsearch.clusterNameClustername of Elasticsearchelasticsearch
    zeebe.tasklist.elasticsearch.urlURL of Elasticsearch REST APIhttp://localhost:9200
    zeebe.tasklist.elasticsearch.usernameUsername to access Elasticsearch REST API-
    zeebe.tasklist.elasticsearch.passwordPassword to access Elasticsearch REST API-

    A snippet from application.yml:

    zeebe.tasklist:
      elasticsearch:
        # Cluster name
        clusterName: elasticsearch
        # Url
        url: http://localhost:9200
    

    Zeebe Broker Connection

    Tasklist needs a connection to Zeebe Broker to start the import.

    Settings to connect

    NameDescriptionDefault value
    zeebe.tasklist.zeebe.brokerContactPointBroker contact point to zeebe as hostname and portlocalhost:26500

    A snippet from application.yml:

    zeebe.tasklist:
      zeebe:
        # Broker contact point
        brokerContactPoint: localhost:26500
    

    Zeebe Elasticsearch Exporter

    Tasklist imports data from Elasticsearch indices created and filled in by Zeebe Elasticsearch Exporter. Therefore settings for this Elasticsearch connection must be defined and must correspond to the settings on Zeebe side.

    Settings to connect and import:

    NameDescriptionDefault value
    zeebe.tasklist.zeebeElasticsearch.clusterNameCluster name of Elasticsearchelasticsearch
    zeebe.tasklist.zeebeElasticsearch.urlURL of Elasticsearch REST APIhttp://localhost:9200
    zeebe.tasklist.zeebeElasticsearch.prefixIndex prefix as configured in Zeebe Elasticsearch exporterzeebe-record
    zeebe.tasklist.zeebeElasticsearch.usernameUsername to access Elasticsearch REST API-
    zeebe.tasklist.zeebeElasticsearch.passwordPassword to access Elasticsearch REST API-

    A snippet from application.yml:

    zeebe.tasklist:
      zeebeElasticsearch:
        # Cluster name
        clusterName: elasticsearch
        # Url
        url: http://localhost:9200
        # Index prefix, configured in Zeebe Elasticsearch exporter
        prefix: zeebe-record
    

    Monitoring and health probes

    Tasklist includes Spring Boot Actuator inside, that provides number of monitoring possibilities., e.g. health check (http://localhost:8080/actuator/health) and metrics (http://localhost:8080/actuator/prometheus) endpoints.

    Tasklist uses following Actuator configuration by default:

    # enable health check and metrics endpoints
    management.endpoints.web.exposure.include: health,prometheus
    # enable Kubernetes health groups:
    # https://docs.spring.io/spring-boot/docs/current/reference/html/production-ready-features.html#production-ready-kubernetes-probes
    management.health.probes.enabled: true
    

    With this configuration following endpoints are available for use out of the box:

    <server>:8080/actuator/prometheus Prometheus metrics

    <server>:8080/actuator/health/liveness Liveness probe

    <server>:8080/actuator/health/readiness Readiness probe

    Example snippets to use Tasklist probes in Kubernetes:

    For details to set Kubernetes probes parameters see: Kubernetes configure probes

    Readiness probe as yaml config:

    readinessProbe:
         httpGet:
            path: /actuator/health/readiness
            port: 8080
         initialDelaySeconds: 30
         periodSeconds: 30
    

    Liveness probe as yaml config:

    livenessProbe:
         httpGet:
            path: /actuator/health/liveness
            port: 8080
         initialDelaySeconds: 30
         periodSeconds: 30
    

    Logging

    Tasklist uses Log4j2 framework for logging. In distribution archive as well as inside a Docker image config/log4j2.xml logging configuration files is included, that can be further adjusted to your needs:

    <?xml version="1.0" encoding="UTF-8"?>
    <Configuration status="WARN" monitorInterval="30">
      <Properties>
        <Property name="LOG_PATTERN">%clr{%d{yyyy-MM-dd HH:mm:ss.SSS}}{faint} %clr{%5p} %clr{${sys:PID}}{magenta} %clr{---}{faint} %clr{[%15.15t]}{faint} %clr{%-40.40c{1.}}{cyan} %clr{:}{faint} %m%n%xwEx</Property>
      </Properties>
      <Appenders>
        <Console name="Console" target="SYSTEM_OUT" follow="true">
          <PatternLayout pattern="${LOG_PATTERN}"/>
        </Console>
    	<Console name="Stackdriver" target="SYSTEM_OUT" follow="true">
          <StackdriverJSONLayout/>
        </Console>
      </Appenders>
      <Loggers>
        <Logger name="io.zeebe.tasklist" level="info" />
        <Root level="info">
          <AppenderRef ref="${env:TASKLIST_LOG_APPENDER:-Console}"/>
        </Root>
      </Loggers>
    </Configuration>
    

    By default Console log appender will be used.

    JSON logging configuration

    You can choose to output logs in JSON format (Stackdriver compatible). To enable it, define the environment variable TASKLIST_LOG_APPENDER like this:

    TASKLIST_LOG_APPENDER=Stackdriver
    

    An example of application.yml file

    The following snippet represents the default Tasklist configuration, which is shipped with the distribution. It can be found inside the config folder (config/application.yml) and can be used to adjust Tasklist to your needs.

    # Tasklist configuration file
    
    zeebe.tasklist:
      # Set Tasklist username and password.
      # If user with <username> does not exists it will be created.
      # Default: demo/demo
      #username:
      #password:
      # ELS instance to store Tasklist data
      elasticsearch:
        # Cluster name
        clusterName: elasticsearch
        # Url
        url: http://localhost:9200
      # Zeebe instance
      zeebe:
        # Broker contact point
        brokerContactPoint: localhost:26500
      # ELS instance to export Zeebe data to
      zeebeElasticsearch:
        # Cluster name
        clusterName: elasticsearch
        # Url
        url: http://localhost:9200
        # Index prefix, configured in Zeebe Elasticsearch exporter
        prefix: zeebe-record
    

    Authentication

    Introduction

    Tasklist provides two ways for authentication:

    1. Authenticate with user information stored in Elasticsearch
    2. Authenticate via Auth0 Single Sign-On provider

    By default user storage in Elasticsearch is enabled.

    User in Elasticsearch

    In this mode the user authenticates with username and password, that are stored in Elasticsearch. username and password for one user may be set in application.yml:

    zeebe.tasklist:
      username: anUser
      password: aPassword
    

    On Tasklist startup the user will be created if not existed before.

    By default one user with username/password demo/demo will be created.

    More users can be added directly to Elasticsearch, to the index tasklist-user-<version>_. Password must be encoded with BCrypt strong hashing function.

    Auth0 Single Sign-On

    Currently Tasklist supports Auth0.com implementation of Single Sign-On.

    Enable Single Sign-On

    Single Sign-On may be enabled only by setting Spring profile: sso-auth

    Example for setting spring profile as environmental variable:

    export SPRING_PROFILES_ACTIVE=sso-auth
    

    Configure Single Sign-On

    Single Sign-On needs following parameters (all are mandatory):

    ParameternameDescription
    zeebe.tasklist.auth0.domainDefines the domain which the user sees
    zeebe.tasklist.auth0.backendDomainDefines the domain which provides user information
    zeebe.tasklist.auth0.clientIdIt's like an user name for the application
    zeebe.tasklist.auth0.clientSecretIt's like a password for the application
    zeebe.tasklist.auth0.claimNameThe claim that will be checked by Tasklist. It's like a permission name
    zeebe.tasklist.auth0.organizationThe given organization should be contained in value of claim name

    Example for setting parameters as environment variables:

    export ZEEBE_TASKLIST_AUTH0_DOMAIN=A_DOMAIN
    export ZEEBE_TASKLIST_AUTH0_BACKENDDOMAIN=A_BACKEND_DDOMAIN
    export ZEEBE_TASKLIST_AUTH0_CLIENTID=A_CLIENT_ID
    export ZEEBE_TASKLIST_AUTH0_CLIENTSECRET=A_SECRET
    export ZEEBE_TASKLIST_AUTH0_CLAIMNAME=A_CLAIM
    export ZEEBE_TASKLIST_AUTH0_ORGANIZATION=AN_ORGANIZATION
    

    Glossary

    This section defines commonly used terminology referenced within the Zeebe documentation.

    Broker

    A broker is an instance of a Zeebe installation which executes workflows and manages workflow state. A single broker will be installed on a single machine.

    Client

    A client interacts with the Zeebe broker on behalf of the business application. Clients poll for work from the broker.

    Cluster

    A cluster represents a configuration of one or more brokers collaborating to execute workflows. Each broker in a cluster acts as a leader or a follower.

    Command

    A command represents an action to be taken or executed. Example commands include: deploy a workflow, execute a workflow, etc.

    Correlation

    Correlation refers to the act of matching a message with an inflight workflow instance.

    Correlation Key

    A correlation is an attribute within a message which is used to match this message against a certain variable within an inflight workflow instance. If the value of the correlation key matches the value of the variable within the workflow instance, the message is matched to this workflow instance.

    Deployment

    A workflow cannot execute unless it is known by the broker. Deployment is the process of pushing or deploying worklows to the broker.

    Event

    An event represents a state change associated with an aspect of an executing workflow instance. Events capture variable changes, state transition in workflow elements, etc. An event will be represented by a timestamp, the variable name and variable value. Events are stored in an append-only log.

    Exporter

    An exporter represents a sink to which Zeebe will submitted all records within the log. This gives users of Zeebe an opportunity to persist records with the log for future use as this data will not be available after log compaction.

    Follower

    In a clustered environment, a broker which is not a leader is a follower of a given partition. A follower can become the new leader when the old leader is no longer reachable.

    Gateway

    Clients communicate with the Zeebe cluster through a gateway. The gateway provides a gRPC API and forwards client commands to the cluster. Depending on the setup, a gateway can be embedded in the broker or can be configured to be standalone.

    Incident

    An incident represents an error condition which prevents Zeebe from advancing an executing workflow instance. Zeebe will create an incident if there was an uncaught exception thrown in your code and the number of retries of the given step has been exceeded.

    Job

    A job represents a distinct unit of work within a business process. Service tasks represent such jobs in your workflow and are identified by a unique id. A job has a type to allow specific job workers to find jobs that they can work on.

    Job Activation Timeout

    This is the amount of time the broker will wait for a complete or fail response from the job worker after a job has been submitted to the job worker for processing before it marks the job as available again for other job workers.

    Job Worker

    A special type of client that polls for and executes available jobs. An uncompleted job prevents Zeebe from advancing workflow execution to the next step.

    Leader

    In a clustered environment, one broker, the leader, is responsible for workflow execution and housekeeping of data within a partition. Housekeeping includes, taking snapshots, replication and running exports.

    Log

    The log comprises of an ordered sequence of records written to persistent storage. The log is appended-only and is stored on disk within the broker.

    Message

    A message contains information to be delivered to interested parties during execution of a workflow instance. Messages can be published via Kafka or Zeebe’s internal messaging system. Messages are associated with timestamp and other constraints such as time-to-live (TTL).

    Partition

    A partition represents a logical grouping of data in a Zeebe broker. This data includes workflow instance variables stored in RocksDB, commands and events generated by Zeebe stored in the log. The number of partitions is defined by configuration.

    Record

    A record represents a command or an event. For example, a command to create a new workflow instance, or a state transition of an executing workflow instance representing an event at a given point in time would result to generation of a record. During the execution lifecycle of a workflow instance, numerous records will be generated to capture various commands and events generated. Records are stored in the log.

    Replication

    Replication is the act of copying data in a partition from a leader to its followers within a clustered Zeebe installation. After replication, the leader and followers of a partition will have the exact same data. Replication allows the system to be resilient to brokers going down.

    Replication Factor

    This is the number of times data in a partition will be copied and this depends on the number of brokers in a cluster. A cluster with one leader and two followers will have a replication factor of three, as data in each partition needs to have three copies.

    Request Timeout

    This is how long a client will wait for a response from the broker after the client has submitted a request. If a response is not received within the client request timeout, the client can consider the broker unreachable.

    Snapshot

    The state of all active workflows instances, (these are also known as inflight workflow instances) are stored as records in an in-memory database called RocksDB. A snapshot represents a copy of all data within the in-memory database at any given point in time. Snapshots are binary images stored on disk and can be used to restore execution state of a workflow. The size of a snapshot is affected by the size of the data. Size of the data depends on several factors including complexity of the model or business process, the size and quantity of variables in each workflow instance as well as the total number of executing workflow instances in a broker.

    Segment

    The log consists of one or more segments. Each segment is a file that contains an ordered sequence records. Segments are deleted when the log is compacted.

    Worker

    A worker executes a job. In the Zeebe nomenclature, these are also referred to as job workers.

    Workflow

    A workflow is a defined sequence of distinct steps representing your business logic. Examples of a workflow could be an e-commerce shopping experience, onboarding a new employee, etc. In Zeebe, workflows are identified by a unique process id. The workflow is usually also referred to as the BPMN model.

    Workflow Instance

    While a workflow represents a defined sequence of distinct steps representing your business logic, a workflow instance represents a currently executing or completed workflow. For a single workflow, there could be many associated workflow instances in various stages of their executing lifecycle. Workflow instances are identitied by workflow instance id. Executing workflows instances are also sometimes referred to as inflight workflows.

    Workflow Instance Variable

    A workflow instance variable represents the execution state (i.e data) of a workflow instance. These variables capture business process parameters which are input and output of various stages of the workflow instance and which also influence process flow execution.

    Appendix

    Broker Configuration Templates

    This section references the default Zeebe standalone broker configuration templates, which are shipped with the distribution. They can be found inside the config folder and can be used to adjust Zeebe to your needs.

    Default Standalone Broker Configuration

    The default configuration contains the most common configuration options.

    # Zeebe Standalone Broker configuration file (with embedded gateway)
    
    # This file is based on broker.standalone.yaml.template but stripped down to contain only a limited
    # set of configuration options. These are a good starting point to get to know Zeebe.
    # For advanced configuration options, have a look at the templates in this folder.
    
    # !!! Note that this configuration is not suitable for running a standalone gateway. !!!
    # If you want to run a standalone gateway node, please have a look at gateway.yaml.template
    
    # ----------------------------------------------------
    # Byte sizes
    # For buffers and others must be specified as strings and follow the following
    # format: "10U" where U (unit) must be replaced with KB = Kilobytes, MB = Megabytes or GB = Gigabytes.
    # If unit is omitted then the default unit is simply bytes.
    # Example:
    # sendBufferSize = "16MB" (creates a buffer of 16 Megabytes)
    #
    # Time units
    # Timeouts, intervals, and the likes, must be specified either in the standard ISO-8601 format used
    # by java.time.Duration, or as strings with the following format: "VU", where:
    #   - V is a numerical value (e.g. 1, 5, 10, etc.)
    #   - U is the unit, one of: ms = Millis, s = Seconds, m = Minutes, or h = Hours
    #
    # Paths:
    # Relative paths are resolved relative to the installation directory of the broker.
    # ----------------------------------------------------
    
    zeebe:
      broker:
        gateway:
          # Enable the embedded gateway to start on broker startup.
          # This setting can also be overridden using the environment variable ZEEBE_BROKER_GATEWAY_ENABLE.
          enable: true
    
          network:
            # Sets the port the embedded gateway binds to.
            # This setting can also be overridden using the environment variable ZEEBE_BROKER_GATEWAY_NETWORK_PORT.
            port: 26500
    
          security:
            # Enables TLS authentication between clients and the gateway
            # This setting can also be overridden using the environment variable ZEEBE_BROKER_GATEWAY_SECURITY_ENABLED.
            enabled: false
    
        network:
          # Controls the default host the broker should bind to. Can be overwritten on a
          # per binding basis for client, management and replication
          # This setting can also be overridden using the environment variable ZEEBE_BROKER_NETWORK_HOST.
          host: 0.0.0.0
    
        data:
          # Specify a list of directories in which data is stored.
          # This setting can also be overridden using the environment variable ZEEBE_BROKER_DATA_DIRECTORIES.
          directories: [ data ]
          # The size of data log segment files.
          # This setting can also be overridden using the environment variable ZEEBE_BROKER_DATA_LOGSEGMENTSIZE.
          logSegmentSize: 512MB
          # How often we take snapshots of streams (time unit)
          # This setting can also be overridden using the environment variable ZEEBE_BROKER_DATA_SNAPSHOTPERIOD.
          snapshotPeriod: 15m
    
        cluster:
          # Specifies the Zeebe cluster size.
          # This can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_CLUSTERSIZE.
          clusterSize: 1
          # Controls the replication factor, which defines the count of replicas per partition.
          # This can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_REPLICATIONFACTOR.
          replicationFactor: 1
          # Controls the number of partitions, which should exist in the cluster.
          # This can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_PARTITIONSCOUNT.
          partitionsCount: 1
    
        threads:
          # Controls the number of non-blocking CPU threads to be used.
          # WARNING: You should never specify a value that is larger than the number of physical cores
          # available. Good practice is to leave 1-2 cores for ioThreads and the operating
          # system (it has to run somewhere). For example, when running Zeebe on a machine
          # which has 4 cores, a good value would be 2.
          # This setting can also be overridden using the environment variable ZEEBE_BROKER_THREADS_CPUTHREADCOUNT
          cpuThreadCount: 2
          # Controls the number of io threads to be used.
          # This setting can also be overridden using the environment variable ZEEBE_BROKER_THREADS_IOTHREADCOUNT
          ioThreadCount: 2
    

    Standalone Broker (with embedded Gateway)

    # Zeebe Standalone Broker configuration file (with embedded gateway)
    
    # ! ! ! ! ! ! ! ! ! !
    # In order to activate the settings in this file, rename this file to application.yaml.
    # ! ! ! ! ! ! ! ! ! !
    
    # Overview -------------------------------------------
    
    # This file contains a complete list of available configuration options.
    
    # This file shows example values for configuring several exporters. To enable an exporter
    # please uncomment the whole block and overwrite the settings.
    
    # Conventions:
    #
    # Byte sizes
    # For buffers and others must be specified as strings and follow the following
    # format: "10U" where U (unit) must be replaced with KB = Kilobytes, MB = Megabytes or GB = Gigabytes.
    # If unit is omitted then the default unit is simply bytes.
    # Example:
    # sendBufferSize = "16MB" (creates a buffer of 16 Megabytes)
    #
    # Time units
    # Timeouts, intervals, and the likes, must be specified either in the standard ISO-8601 format used
    # by java.time.Duration, or as strings with the following format: "VU", where:
    #   - V is a numerical value (e.g. 1, 5, 10, etc.)
    #   - U is the unit, one of: ms = Millis, s = Seconds, m = Minutes, or h = Hours
    #
    # Paths:
    # Relative paths are resolved relative to the installation directory of the broker.
    # ----------------------------------------------------
    
    # zeebe:
      # broker:
        # Sets the timeout for each start and closing step.
        #
        # Broker bootstrap and closing is divided in several individual steps.
        # Each step should take at max the defined stepTimeout, otherwise the bootstrap is aborted.
        #
        # This setting can also be overridden using the environment variable ZEEBE_BROKER_STEPTIMEOUT.
        # stepTimeout: 5m
    
        # gateway:
          # Enable the embedded gateway to start on broker startup.
          # This setting can also be overridden using the environment variable ZEEBE_BROKER_GATEWAY_ENABLE.
          # enable: true
    
          # network:
            # Sets the host the embedded gateway binds to.
            # This setting can also be overridden using the environment variable ZEEBE_BROKER_GATEWAY_NETWORK_HOST.
            # host: 0.0.0.0
    
            # Sets the port the embedded gateway binds to.
            # This setting can also be overridden using the environment variable ZEEBE_BROKER_GATEWAY_NETWORK_PORT.
            # port: 26500
    
            # Sets the minimum keep alive interval
            # This setting specifies the minimum accepted interval between keep alive pings.
            # This value must be specified as a positive integer followed by 's' for seconds, 'm' for minutes or 'h' for hours.
            # This setting can also be overwritten using the environment variable ZEEBE_BROKER_GATEWAY_NETWORK_MINKEEPALIVEINTERVAL.
            # minKeepAliveInterval: 30s
    
          # cluster:
            # Sets the timeout of requests send to the broker cluster
            # This setting can also be overridden using the environment variable ZEEBE_BROKER_GATEWAY_CLUSTER_REQUESTTIMEOUT.
            # requestTimeout: 15s
    
          # threads:
            # Sets the number of threads the gateway will use to communicate with the broker cluster
            # This setting can also be overridden using the environment variable ZEEBE_BROKER_GATEWAY_THREADS_MANAGEMENTTHREADS.
            # managementThreads: 1
    
          # monitoring:
            # Enables the metrics collection in the gateway
            # This setting can also be overridden using the environment variable ZEEBE_BROKER_GATEWAY_MONITORING_ENABLED.
            # enabled: false
    
          # security:
            # Enables TLS authentication between clients and the gateway
            # This setting can also be overridden using the environment variable ZEEBE_BROKER_GATEWAY_SECURITY_ENABLED.
            # enabled: false
    
            # Sets the path to the certificate chain file
            # This setting can also be overridden using the environment variable ZEEBE_BROKER_GATEWAY_SECURITY_CERTIFICATECHAINPATH.
            # certificateChainPath:
    
            # Sets the path to the private key file location
            # This setting can also be overridden using the environment variable ZEEBE_BROKER_GATEWAY_SECURITY_PRIVATEKEYPATH.
            # privateKeyPath:
    
          # longPolling:
            # Enables long polling for available jobs
            # This setting can also be overridden using the environment variable ZEEBE_BROKER_GATEWAY_LONGPOLLING_ENABLED.
            # enabled: true
    
        # network:
          # This section contains the network configuration. Particularly, it allows to
          # configure the hosts and ports the broker should bind to. The broker exposes three sockets:
          # 1. command: the socket which is used for gateway-to-broker communication
          # 2. internal: the socket which is used for broker-to-broker communication
          # 3. monitoring: the socket which is used to monitor the broker
    
          # Controls the default host the broker should bind to. Can be overwritten on a
          # per binding basis for client, management and replication
          #
          # This setting can also be overridden using the environment variable ZEEBE_BROKER_NETWORK_HOST.
          # host: 0.0.0.0
    
          # Controls the advertised host; if omitted defaults to the host. This is particularly useful if your
          # broker stands behind a proxy.
          # This setting can also be overridden using the environment variable ZEEBE_BROKER_NETWORK_ADVERTISEDHOST.
          # advertisedHost: 0.0.0.0
    
          # If a port offset is set it will be added to all ports specified in the config
          # or the default values. This is a shortcut to not always specifying every port.
          #
          # The offset will be added to the second last position of the port, as Zeebe
          # requires multiple ports. As example a portOffset of 5 will increment all ports
          # by 50, i.e. 26500 will become 26550 and so on.
          #
          # This setting can also be overridden using the environment variable ZEEBE_BROKER_NETWORK_PORTOFFSET.
          # portOffset: 0
    
          # Sets the maximum size of the incoming and outgoing messages (i.e. commands and events).
          # This setting can also be overridden using the environment variable ZEEBE_BROKER_NETWORK_MAXMESSAGESIZE.
          # maxMessageSize: 4MB
    
          # commandApi:
            # Overrides the host used for gateway-to-broker communication
            # This setting can also be overridden using the environment variable ZEEBE_BROKER_NETWORK_COMMANDAPI_HOST.
            # host: 0.0.0.0
    
            # Sets the port used for gateway-to-broker communication
            # This setting can also be overridden using the environment variable ZEEBE_BROKER_NETWORK_COMMANDAPI_PORT.
            # port: 26501
    
          # internalApi:
            # Overrides the host used for internal broker-to-broker communication
            # This setting can also be overridden using the environment variable ZEEBE_BROKER_NETWORK_INTERNALAPI_HOST.
            # host: 0.0.0.0
    
            # Sets the port used for internal broker-to-broker communication
            # This setting can also be overridden using the environment variable ZEEBE_BROKER_NETWORK_INTERNALAPI_PORT.
            # port: 26502
    
          # monitoringApi:
            # Overrides the host used for exposing monitoring information
            # This setting can also be overridden using the environment variable ZEEBE_BROKER_NETWORK_MONITORINGAPI_HOST.
            # host: 0.0.0.0
    
            # Sets the port used for exposing monitoring information
            # This setting can also be overridden using the environment variable ZEEBE_BROKER_NETWORK_MONITORINGAPI_PORT.
            # port: 9600
    
        # data:
          # This section allows to configure Zeebe's data storage. Data is stored in
          # "partition folders". A partition folder has the following structure:
          #
          # partition-0                       (root partition folder)
          # ├── partition.json                (metadata about the partition)
          # ├── segments                      (the actual data as segment files)
          # │   ├── 00.data
          # │   └── 01.data
          # └── state                     	(stream processor state and snapshots)
          #     └── stream-processor
          #		  ├── runtime
          #		  └── snapshots
    
          # Specify a list of directories in which data is stored. Using multiple
          # directories makes sense in case the machine which is running Zeebe has
          # multiple disks which are used in a JBOD (just a bunch of disks) manner. This
          # allows to get greater throughput in combination with a higher io thread count
          # since writes to different disks can potentially be done in parallel.
          #
          # This setting can also be overridden using the environment variable ZEEBE_BROKER_DATA_DIRECTORIES.
          # directories: [ data ]
    
          # The size of data log segment files.
          # This setting can also be overridden using the environment variable ZEEBE_BROKER_DATA_LOGSEGMENTSIZE.
          # logSegmentSize: 512MB
    
          # How often we take snapshots of streams (time unit)
          # This setting can also be overridden using the environment variable ZEEBE_BROKER_DATA_SNAPSHOTPERIOD.
          # snapshotPeriod: 15m
    
          # When the disk usage is above this value all client commands will be rejected.
          # The value is specified as a percentage of the total disk space.
          # The value should be in the range (0, 1).
          # This setting can also be overridden using the environment variable ZEEBE_BROKER_DATA_DISKUSAGECOMMANDWATERMARK
          # diskUsageCommandWatermark = 0.97
    
          # When the disk usage is above this value, this broker will stop writing replicated events it receives from other brokers.
          # The value is specified as a percentage of the total disk space.
          # The value should be in the range (0, 1).
          # It is recommended that diskUsageReplicationWatermark > diskUsageCommandWatermark
          # This setting can also be overridden using the environment variable ZEEBE_BROKER_DATA_DISKUSAGEREPLICATIONWATERMARK
          # diskUsageReplicationWatermark = 0.99
    
          # Sets the interval at which the disk usage is monitored
          # This setting can also be overridden using the environment variable ZEEBE_BROKER_DATA_DISKUSAGEMONITORINGINTERVAL
          # diskUsageMonitoringInterval = 1s
    
          # rocksdb:
            # Specify custom column family options overwriting Zeebe's own defaults.
            # WARNING: This setting requires in-depth knowledge of Zeebe's embedded database: RocksDB.
            # The expected property key names and values are derived from RocksDB's C implementation,
            # and are not limited to the provided examples below. Please look in RocksDB's SCM repo
            # for the files: `cf_options.h` and `options_helper.cc`. This setting can also be overridden
            # using the environment variable ZEEBE_BROKER_DATA_ROCKSDB_COLUMNFAMILYOPTIONS_{PROPERTY_KEY_NAME}
            # For example, `write_buffer_size` can be set using `ZEEBE_BROKER_DATA_ROCKSDB_COLUMNFAMILYOPTIONS_WRITE_BUFFER_SIZE`.
            # columnFamilyOptions:
              # compaction_pri: "kOldestSmallestSeqFirst"
              # write_buffer_size: 67108864
    
        # cluster:
          # This section contains all cluster related configurations, to setup a zeebe cluster
    
          # Specifies the unique id of this broker node in a cluster.
          # The id should be between 0 and number of nodes in the cluster (exclusive).
          #
          # This setting can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_NODEID.
          # nodeId: 0
    
          # Controls the number of partitions, which should exist in the cluster.
          #
          # This can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_PARTITIONSCOUNT.
          # partitionsCount: 1
    
          # Controls the replication factor, which defines the count of replicas per partition.
          # The replication factor cannot be greater than the number of nodes in the cluster.
          #
          # This can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_REPLICATIONFACTOR.
          # replicationFactor: 1
    
          # Specifies the zeebe cluster size. This value is used to determine which broker
          # is responsible for which partition.
          #
          # This can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_CLUSTERSIZE.
          # clusterSize: 1
    
          # Allows to specify a list of known other nodes to connect to on startup
          # The contact points of the internal network configuration must be specified.
          # The format is [HOST:PORT]
          # Example:
          # initialContactPoints : [ 192.168.1.22:26502, 192.168.1.32:26502 ]
          #
          # To guarantee the cluster can survive network partitions, all nodes must be specified
          # as initial contact points.
          #
          # This setting can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_INITIALCONTACTPOINTS
          # specifying a comma-separated list of contact points.
          # Default is empty list:
          # initialContactPoints: []
    
          # Allows to specify a name for the cluster
          # This setting can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_CLUSTERNAME.
          # Example:
          # clusterName: zeebe-cluster
    
          # Configure parameters for SWIM protocol which is used to propagate cluster membership
          # information among brokers and gateways
          # membership:
    
            # Configure whether to broadcast member updates to all members.
            # If set to false updates will be gossiped among the members.
            # If set to true the network traffic may increase but it reduce the time to detect membership changes.
            # This setting can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_MEMBERSHIP_BROADCASTUPDATES
            # broadcastUpdates: false
    
            # Configure whether to broadcast disputes to all members.
            # If set to true the network traffic may increase but it reduce the time to detect membership changes.
            # This setting can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_MEMBERSHIP_BROADCASTDISPUTES
            # broadcastDisputes: true
    
            # Configure whether to notify a suspect node on state changes.
            # This setting can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_MEMBERSHIP_NOTIFYSUSPECT
            # notifySuspect: false
    
            # Sets the interval at which the membership updates are sent to a random member.
            # This setting can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_MEMBERSHIP_GOSSIPINTERVAL
            # gossipInterval: 250ms
    
            # Sets the number of members to which membership updates are sent at each gossip interval.
            # This setting can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_MEMBERSHIP_GOSSIPFANOUT
            # gossipFanout: 2
    
            # Sets the interval at which to probe a random member.
            # This setting can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_MEMBERSHIP_PROBEINTERVAL
            # probeInterval: 1s
    
            # Sets the timeout for a probe response
            # This setting can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_MEMBERSHIP_PROBETIMEOUT
            # probeTimeout: 2s
    
            # Sets the number of probes failed before declaring a member is suspect
            # This setting can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_MEMBERSHIP_SUSPECTPROBES
            # suspectProbes: 3
    
            # Sets the timeout for a suspect member is declared dead.
            # This setting can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_MEMBERSHIP_FAILURETIMEOUT
            # failureTimeout: 10s
    
            # Sets the interval at which this member synchronizes its membership information with a random member.
            # This setting can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_MEMBERSHIP_SYNCINTERVAL
            # syncInterval: 10s
    
        # threads:
          # Controls the number of non-blocking CPU threads to be used. WARNING: You
          # should never specify a value that is larger than the number of physical cores
          # available. Good practice is to leave 1-2 cores for ioThreads and the operating
          # system (it has to run somewhere). For example, when running Zeebe on a machine
          # which has 4 cores, a good value would be 2.
          #
          # This setting can also be overridden using the environment variable ZEEBE_BROKER_THREADS_CPUTHREADCOUNT
          # cpuThreadCount: 2
    
          # Controls the number of io threads to be used. These threads are used for
          # workloads that write data to disk. While writing, these threads are blocked
          # which means that they yield the CPU.
          #
          # This setting can also be overridden using the environment variable ZEEBE_BROKER_THREADS_IOTHREADCOUNT
          # ioThreadCount: 2
    
        # backpressure:
          # Configure backpressure below.
          #
          # Set this to enable or disable backpressure. When enabled the broker rejects user requests when
          # the number of inflight requests is greater than than the "limit". The value of the "limit" is determined
          # based on the configured algorithm.
          # This setting can also be overridden using the environment variable ZEEBE_BROKER_BACKPRESSURE_ENABLED
          # enabled : true
    
          # if enabled - will use the average latencies over a window as the current latency to update the limit.
          # It is not recommended to enable this when the algorithm is aimd. This setting is not applicable to fixed limit.
          # This setting can also be overridden using the environment variable ZEEBE_BROKER_BACKPRESSURE_USEWINDOWED
          # useWindowed: true
    
          # The algorithm configures which algorithm to use for the backpressure.
          # It should be one of vegas, aimd, fixed, gradient, or gradient2.
          # This setting can also be overridden using the environment variable ZEEBE_BROKER_BACKPRESSURE_ALGORITHM
          # algorithm: "fixed"
    
          # Configure the parameters for "aimd" algorithm.
          # AIMD increases the limit for every successful response and decrease the limit for every request timeout.
          # aimd:
            # The limit will be reduced if the observed latency is greater than the requestTimeout.
            # This setting can also be overridden using the environment variable ZEEBE_BROKER_BACKPRESSURE_AIMD_REQUESTTIMEOUT
            # requestTimeout: "1s"
    
            # The initial limit to be used when the broker starts. The limit will be reset to this value when the broker restarts.
            # This setting can also be overridden using the environment ZEEBE_BROKER_BACKPRESSURE_AIMD_INITIALLIMIT
            # initialLimit: 100
    
            # The minimum limit. This setting can also be overridden using the environment variable ZEEBE_BROKER_BACKPRESSURE_AIMD_MINLIMIT
            # minLimit: 1
    
            # The maximum limit. This setting can also be overridden using the environment variable ZEEBE_BROKER_BACKPRESSURE_AIMD_MAXLIMIT
            # maxLimit: 1000
    
            # The backoffRatio is a double value x such that 0 <  x  < 1. It determines the factor by which the limit is decreased.
            # This setting can also be overridden using the environment variable ZEEBE_BROKER_BACKPRESSURE_AIMD_BACKOFFRATIO
            # backoffRatio: 0.9
    
          # Configure the parameters for "fixed" algorithm
          # fixed:
            # Set a fixed limit. This setting can also be overridden using the environment variable ZEEBE_BROKER_BACKPRESSURE_FIXED_LIMIT
            # limit: 10
    
          # Configure the parameters for "vegas" algorithm
          # Vegas is an adaptive limit algorithm based on TCP Vegas congestion control algorithm.
          # It estimates a queue size which indicates how many additional requests are in the queue over the estimated limit.
          # The limit is adjusted based on this queueSize.
          # vegas:
            # The initial limit to be used when the broker starts. The limit will be reset to this value when the broker restarts.
            # This setting can also be overridden using the environment variable ZEEBE_BROKER_BACKPRESSURE_VEGAS_INITIALLIMIT
            # initialLimit: 20
    
            # The limit is increased if the queue size is less than this value.
            # This setting can also be overridden using the environment variable ZEEBE_BROKER_BACKPRESSURE_VEGAS_ALPHA
            # alpha: 3
    
            # The limit is decreased if the queue size is greater than this value.
            # This setting can also be overridden using the environment variable ZEEBE_BROKER_BACKPRESSURE_VEGAS_BETA
            # beta: 6
    
          # Configure the parameters for "gradient" algorithm
          # In gradient algorithm, the limit is adjusted based on the gradient of observed latency and an estimated minimum latency.
          # If gradient is less than 1, the limit is decreased otherwise the limit is increased.
          # gradient:
            # The minimum limit. This setting can also be overridden using the environment variable ZEEBE_BROKER_BACKPRESSURE_GRADIENT_MINLIMIT
            # minLimit: 10
    
            # The initial limit to be used when the broker starts. The limit will be reset to this value when the broker restarts.
            # This setting can also be overridden using the environment variable ZEEBE_BROKER_BACKPRESSURE_GRADIENT_INITIALLIMIT
            # initialLimit: 20
    
            # Tolerance for changes from minimum latency. A value >= 1.0 indicating how much change from minimum latency is acceptable
            # before reducing the limit.  For example, a value of 2.0 means that a 2x increase in latency is acceptable.
            # This setting can also be overridden using the environment variable ZEEBE_BROKER_BACKPRESSURE_GRADIENT_RTTTOLERANCE
            # rttTolerance: 2.0
    
          # Configure the parameters for "gradient2" algorithm
          # gradient2:
            # The minimum limit. This setting can also be overridden using the environment variable ZEEBE_BROKER_BACKPRESSURE_GRADIENT2_MINLIMIT
            # minLimit: 10
    
            # The initial limit to be used when the broker starts. The limit will be reset to this value when the broker restarts.
            # This setting can also be overridden using the environment variable ZEEBE_BROKER_BACKPRESSURE_GRADIENT2_INITIALLIMIT
            # initialLimit: 20
    
            # Tolerance for changes from minimum latency. A value >= 1.0 indicating how much change from minimum latency is acceptable
            # before reducing the limit.  For example, a value of 2.0 means that a 2x increase in latency is acceptable.
            # This setting can also be overridden using the environment variable ZEEBE_BROKER_BACKPRESSURE_GRADIENT2_RTTTOLERANCE
            # rttTolerance: 2.0
    
            # longWindow is the length of the window (the number of samples) to calculate the exponentially smoothed average latency.
            # This setting can also be overridden using the environment ZEEBE_BROKER_BACKPRESSURE_GRADIENT2_LONGWINDOW
            # longWindow: 600
    
        # exporters:
          # Configure exporters below
          #
          # Each exporter should be configured following this template:
          #
          # jarPath:
          #   path to the JAR file containing the exporter class. JARs are only loaded once, so you can define
          #   two exporters that point to the same JAR, with the same class or a different one, and use args
          #   to parametrize its instantiation.
          # className:
          #   entry point of the exporter, a class which *must* extend the io.zeebe.exporter.Exporter
          #   interface.
          #
          # A nested table as "args:" will allow you to inject arbitrary arguments into your
          # class through the use of annotations.
          #
          # These setting can also be overridden using the environment variables "ZEEBE_BROKER_EXPORTERS_[exporter name]_..."
          #
    
          # Debug Log Exporter --------------
          #
          # Enable the following debug exporter to log the exported records to console
          #
          # These setting can also be overridden using the environment variables "ZEEBE_BROKER_EXPORTERS_DEBUGLOG_..."
          #
          # debuglog:
            # className: io.zeebe.broker.exporter.debug.DebugLogExporter
            # args:
            #   logLevel: debug
            #   prettyPrint: false
    
          # Debug HTTP Export ---------------
          #
          # Enable the following debug exporter to start a http server to inspect the exported records
          #
          # These setting can also be overridden using the environment variables "ZEEBE_BROKER_EXPORTERS_DEBUGHTTP_..."
          #
          # debugHttp:
            # className: io.zeebe.broker.exporter.debug.DebugHttpExporter
            # args:
            #   port = 8000
            #   limit = 1024
    
          # Elasticsearch Exporter ----------
          # An example configuration for the elasticsearch exporter:
          #
          # These setting can also be overridden using the environment variables "ZEEBE_BROKER_EXPORTERS_ELASTICSEARCH_..."
          #
          # elasticsearch:
            # className: io.zeebe.exporter.ElasticsearchExporter
            #
            # args:
            #   url: http://localhost:9200
            #
            #   bulk:
            #     delay: 5
            #     size: 1000
            #     memoryLimit: 10485760
            #
            #   authentication:
            #     username: elastic
            #     password: changeme
            #
            #   index:
            #     prefix: zeebe-record
            #     createTemplate: true
            #
            #     command: false
            #     event: true
            #     rejection: false
            #
            #     deployment: true
            #     error: true
            #     incident: true
            #     job: true
            #     jobBatch: false
            #     message: false
            #     messageSubscription: false
            #     variable: true
            #     variableDocument: true
            #     workflowInstance: true
            #     workflowInstanceCreation: false
            #     workflowInstanceSubscription: false
            #
            #     ignoreVariablesAbove: 32677
    

    The template for the broker node (without embedded gateway) is pretty much the same. The only difference is that the embedded gateway is disabled and the corresponding configuration details are absent.

    Gateway Configuration Template

    The following snippet represents the default Zeebe gateway configuration, which is shipped with the distribution. It can be found inside the config folder (config/gateway.yaml.template) and can be used to adjust Zeebe to your needs.

    Source on github

    # Zeebe standalone gateway configuration file.
    
    # ! ! ! ! ! ! ! ! ! !
    # In order to activate the settings in this file, rename this file to application.yaml
    # ! ! ! ! ! ! ! ! ! !
    
    # For the configuration of the embedded gateway that is deployed alongside a broker see the gateway section of broker.yaml.template
    
    # Overview -------------------------------------------
    
    # This file contains a complete list of available configuration options.
    
    # Conventions:
    #
    # Byte sizes
    # For buffers and others must be specified as strings and follow the following
    # format: "10U" where U (unit) must be replaced with KB = Kilobytes, MB = Megabytes or GB = Gigabytes.
    # If unit is omitted then the default unit is simply bytes.
    # Example:
    # sendBufferSize = "16MB" (creates a buffer of 16 Megabytes)
    #
    # Time units
    # Timeouts, intervals, and the likes, must be specified either in the standard ISO-8601 format used
    # by java.time.Duration, or as strings with the following format: "VU", where:
    #   - V is a numerical value (e.g. 1, 5, 10, etc.)
    #   - U is the unit, one of: ms = Millis, s = Seconds, m = Minutes, or h = Hours
    #
    # Paths:
    # Relative paths are resolved relative to the installation directory of the gateway.
    
    # ----------------------------------------------------
    
    # zeebe:
      # gateway:
        # network:
          # Sets the host the gateway binds to
          # This setting can also be overridden using the environment variable ZEEBE_GATEWAY_NETWORK_HOST.
          # host: 0.0.0.0
          #
          # Sets the port the gateway binds to
          # This setting can also be overridden using the environment variable ZEEBE_GATEWAY_NETWORK_PORT.
          # port: 26500
          #
          # Sets the minimum keep alive interval
          # This setting specifies the minimum accepted interval between keep alive pings. This value must
          # be specified as a positive integer followed by 's' for seconds, 'm' for minutes or 'h' for hours.
          # This setting can also be overridden using the environment variable ZEEBE_GATEWAY_NETWORK_MINKEEPALIVEINTERVAL.
          # minKeepAliveInterval: 30s
    
        # cluster:
          # Sets the broker the gateway should initial contact
          # This setting can also be overridden using the environment variable ZEEBE_GATEWAY_CLUSTER_CONTACTPOINT.
          # contactPoint: 127.0.0.1:26502
    
          # Sets the timeout of requests send to the broker cluster
          # This setting can also be overridden using the environment variable ZEEBE_GATEWAY_CLUSTER_REQUESTTIMEOUT.
          # requestTimeout: 15s
    
          # Sets name of the Zeebe cluster to connect to
          # This setting can also be overridden using the environment variable ZEEBE_GATEWAY_CLUSTER_CLUSTERNAME.
          # clusterName: zeebe-cluster
    
          # Sets the member id of the gateway in the cluster
          # This setting can also be overridden using the environment variable ZEEBE_GATEWAY_CLUSTER_MEMBERID.
          # memberId: gateway
    
          # Sets the host the gateway node binds to for internal cluster communication
          # This setting can also be overridden using the environment variable ZEEBE_GATEWAY_CLUSTER_HOST.
          # host: 0.0.0.0
    
          # Sets the port the gateway node binds to for internal cluster communication
          # This setting can also be overridden using the environment variable ZEEBE_GATEWAY_CLUSTER_PORT.
          # port: 26502
    
          # Configure parameters for SWIM protocol which is used to propagate cluster membership
          # information among brokers and gateways
          # membership:
    
            # Configure whether to broadcast member updates to all members.
            # If set to false updates will be gossiped among the members.
            # If set to true the network traffic may increase but it reduce the time to detect membership changes.
            # This setting can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_MEMBERSHIP_BROADCASTUPDATES
            # broadcastUpdates: false
    
            # Configure whether to broadcast disputes to all members.
            # If set to true the network traffic may increase but it reduce the time to detect membership changes.
            # This setting can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_MEMBERSHIP_BROADCASTDISPUTES
            # broadcastDisputes: true
    
            # Configure whether to notify a suspect node on state changes.
            # This setting can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_MEMBERSHIP_NOTIFYSUSPECT
            # notifySuspect: false
    
            # Sets the interval at which the membership updates are sent to a random member.
            # This setting can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_MEMBERSHIP_GOSSIPINTERVAL
            # gossipInterval: 250ms
    
            # Sets the number of members to which membership updates are sent at each gossip interval.
            # This setting can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_MEMBERSHIP_GOSSIPFANOUT
            # gossipFanout: 2
    
            # Sets the interval at which to probe a random member.
            # This setting can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_MEMBERSHIP_PROBEINTERVAL
            # probeInterval: 1s
    
            # Sets the timeout for a probe response
            # This setting can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_MEMBERSHIP_PROBETIMEOUT
            # probeTimeout: 2s
    
            # Sets the number of probes failed before declaring a member is suspect
            # This setting can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_MEMBERSHIP_SUSPECTPROBES
            # suspectProbes: 3
    
            # Sets the timeout for a suspect member is declared dead.
            # This setting can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_MEMBERSHIP_FAILURETIMEOUT
            # failureTimeout: 10s
    
            # Sets the interval at which this member synchronizes its membership information with a random member.
            # This setting can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_MEMBERSHIP_SYNCINTERVAL
            # syncInterval: 10s
    
        # threads:
          # Sets the number of threads the gateway will use to communicate with the broker cluster
          # This setting can also be overridden using the environment variable ZEEBE_GATEWAY_THREADS_MANAGEMENTTHREADS.
          # managementThreads: 1
    
        # monitoring:
          # Enables the metrics collection and exporting it over HTTP
          # This setting can also be overridden using the environment variable ZEEBE_GATEWAY_MONITORING_ENABLED.
          # enabled: false
    
          # Sets the host the monitoring binds to
          # This setting can also be overridden using the environment variable ZEEBE_GATEWAY_MONITORING_HOST.
          # host: 0.0.0.0
    
          # Sets the port the monitoring binds to
          # This setting can also be overridden using the environment variable ZEEBE_GATEWAY_MONITOIRNG_PORT.
          # port: 9600
    
        # security:
          # Enables TLS authentication between clients and the gateway
          # This setting can also be overridden using the environment variable ZEEBE_GATEWAY_SECURITY_ENABLED.
          # enabled: false
    
          # Sets the path to the certificate chain file
          # This setting can also be overridden using the environment variable ZEEBE_GATEWAY_SECURITY_CERTIFICATECHAINPATH.
          # certificateChainPath:
    
          # Sets the path to the private key file location
          # This setting can also be overridden using the environment variable ZEEBE_GATEWAY_SECURITY_PRIVATEKEYPATH.
          # privateKeyPath:
    
        # longPolling:
          # Enables long polling for available jobs
          # This setting can also be overridden using the environment variable ZEEBE_GATEWAY_LONGPOLLING_ENABLED.
          # enabled: true
    

    Gateway Health Indicators and Probes

    The health status for a standalone gateway is available at {zeebe-gateway}:8080/actuator/health

    The following health indicators are enabled by default

    • Gateway Started - checks whether the Gateway is running (i.e. not currently starting and not yet shut down)
    • Gateway Responsive - checks whether the Gateway can handle a request within a given timeout
    • Gateway Cluster Awareness - checks whether the Gateway is aware of other nodes in the cluster
    • Gateway Partition Leader Awareness - checks whether the Gateway is aware of partition leaders in the cluster
    • Disk Space - checks that the free disk space is greater than 10 MB
    • Memory - checks that at least 10% of max memory (heap) are still available

    Health indicators are set to sensible defaults. For specific use cases, it might be necessary to customize health probes.

    Startup Probe

    The started probe is available at {zeebe-gateway}:8080/actuator/health/startup

    In the default configuration this is merely an alias for the Gateway Started health indicator. Other configurations are possible (see below)

    Liveness Probe

    The liveness probe is available at {zeebe-gateway}:8080/actuator/health/liveness

    It is based on the health indicators mentioned above.

    In the default configuration, the liveness probe is comprised of the following health indiactors:

    • Gateway Started - checks whether the Gateway is running (i.e. not currently starting and not yet shut down)
    • Liveness Gateway Responsive - checks whether the Gateway can handle a request within an ample timeout, but will only report a DOWN health status after the underlying health indicator is down for more than 10 minutes
    • Liveness Gateway Cluster Awareness - based on Gateway cluster awareness, but will only report a DOWN health status after the underlying health indicator is down for more than 5 minutes
    • Liveness Gateway Partition Leader Awareness - based on Gateway partition leader awareness, but will only report a DOWN health status after the underlying health indicator is down for more than 5 minutes
    • Liveness Disk Space - checks that the free disk space is greater than 1 MB
    • Liveness Memory - checks that at least 1% of max memory (heap) are still available

    Note that health indicators with the liveness prefix are intended to be customized for the livness probe. This allows defining tighter thresholds (e.g. for free memory 1% for liveness vs. 10% for health), as well as adding tolerance for short downtimes (e.g. gateway has no awereness of other nodes in the cluster for more than 5 minutes).

    Customizing Health Probes

    Global settings for all health indicators:

    • management.health.defaults.enabled=true - enables (default) or disables all health indicators
    • management.endpoint.health.show-details=always/never - toggles whether a summary or details (default) of the health indicators will be returned

    Startup Probe

    Settings for started probe:

    • management.endpoint.health.group.startup.show-details=never - toggles whether a summary (default) or details of the startup probe will be returned
    • management.endpoint.health.group.startup.include=gatewayStarted - defines which health indicators are included in the startup probe

    Liveness Probe

    Settings for liveness probe:

    • management.endpoint.health.group.liveness.show-details=never - toggles whether a summary (default) or details of the liveness probe will be returned
    • management.endpoint.health.group.liveness.include=gatewayStarted,livenessGatewayResponsive,livenessGatewayClusterAwareness,livenessGatewayPartitionLeaderAwareness,livenessDiskSpace,livenessMemory - defines which health indicators are included in the liveness probe

    Note that the individual contributing health indicators of the liveness probe can be configured as well (see below).

    Gateway Started

    Settings for gateway started health indicator:

    • management.health.gateway-started.enabled=true - enables (default) or disables this health indicator

    Gateway Responsive

    Settings for gateway repsonsiveness health indicator:

    • management.health.gateway-responsive.enabled=true - enables (default) or disables this health indicator
    • management.health.gateway-responsive.requestTimeout=500ms - defines the timeout for the request; if the test completes before the timeout, the health status is UP, otherwise it is DOWN
    • management.health.liveness.gateway-responsive.requestTimeout=5s - defines the timeout for the request for liveness probe; if the request completes before the timeout, the health status is UP
    • management.health.liveness.gateway-responsive.maxdowntime=10m - - defines the maximum downtime before the liveness health indicator for responsiveness will flip

    Gateway Cluster Awareness

    Settings for gateway cluster awareness health indicator:

    • management.health.gateway-clusterawareness.enabled=true - enables (default) or disables this health indicator (and its liveness counterpart)
    • management.health.liveness.gateway-clusterawareness.maxdowntime=5m - defines the maximum downtime before the liveness health indicator for cluster awareness will flip. In other words: this health indicator will report DOWN after the gateway was unaware of other members in the cluster for more than 5 minutes

    Gateway Partition Leader Awareness

    Settings for gateway partition leader awareness health indicator:

    • management.health.gateway-partitionleaderawareness.enabled=true - enables (default) or disables this health indicator (and its liveness counterpart)
    • management.health.liveness.gateway-partitionleaderawareness.maxdowntime=5m - defines the maximum downtime before the liveness health indicator for partition leader awareness will flip. In other words: this health indicator will report DOWN after the gateway was unaware of partition leaders for more than 5 minutes

    Disk Space

    This is arguably the least critical health indicator given that the standalone gateway does not write to disk. The only exception may be the writing of log files, which depend on the log configuration.

    Settings for disk space health indicator:

    • management.health.diskspace.enabled=true - enables (default) or disables this health indicator (and its liveness counterpart)
    • management.health.diskspace.threshold=10MB - defines the threshold for the required free disk space
    • management.health.diskspace.path=. - defines the path for which the free disk space is examined
    • management.health.liveness.diskspace.threshold=1MB - defines the threshold for the required free disk space for liveness
    • management.health.liveness.diskspace.path=. - defines the path for which the free disk space for liveness is examined

    Memory

    This health indicator examines free memory (heap).

    Settings for memory health indicator:

    • management.health.memory.enabled=true - enables (default) or disables this health indicator (and its liveness counterpart)
    • management.health.memory.threshold=0.1 - defines the threshold for the required free memory. The default is 0.1 which is interpreted as 10% of max memory
    • management.health.liveness.memory.threshold=0.01 - defines the threshold for the required free memory for liveness. The default is 0.01 which is interpreted as 10 of max memory

    Environment Variables

    Environment Variables for Configuration

    The configuration can be provided as a file or through environment variables. Mixing both sources is also possible. In that case environment variables have precedence over the configuration settings in the configuration file.

    All available environment variables are documented in the configuration file templates:

    Environment Variables for Operators

    The following environment variables are intended for operators:

    • ZEEBE_LOG_LEVEL: Sets the log level of the Zeebe Logger (default: info).
    • ZEEBE_LOG_APPENDER: Sets the console log appender (default: Console). We recommend using Stackdriver if Zeebe runs on Google Cloud Platform to output JSON formatted log messages

    Environment Variables for Developers

    The following environment variables are intended for developers:

    • SPRING_PROFILES_ACTIVE=dev: If this is set, the broker will start in a temporary folder and all data will be cleaned up upon exit
    • ZEEBE_DEBUG=true/false: Activates a DebugLogExporter with default settings. The value of the environment variable toggles pretty printing

    Note: It is not recommended to use these settings in production.

    Deprecated Features

    This section lists deprecated features.

    Deprecated in 0.23.0-alpha2

    • TOML configuration - deprecated and removed in 0.23.0-alpha2
    • Legacy environment variables - deprecated in 0.23.0-alpha2, removed in 0.25.0

    New configuration:

    exporters:
      elasticsearch:
        className: io.zeebe.exporter.ElasticsearchExporter
      debughttp:
        className: io.zeebe.broker.exporter.debug.DebugHttpExporter
    

    In terms of specifying values, there were two minor changes:

    • Memory sizes are now specified like this: 512MB (old way: 512M)
    • Durations, e.g. timeouts, can now also be given in ISO-8601 Durations format. However you can still use the established way and specify a timeout of 30s