Inconsistent Orquesta WF execution times for same workflow

techdiverdown · December 26, 2019, 7:19pm

I have a very simple workflow borrowed from examples in github as follows:
version: 1.0

description: A workflow demonstrating with items and concurrent processing.

input:

members: [1,2,3,4,5,6,7,8,9,10]
concurrency: 5
trace_id

tasks:
task1:
with:
items: <% ctx(members) %>
concurrency: <% ctx(concurrency) %>
action: core.echo message="<% item() %>, resistance is futile!"

output:

items: <% task(task1).result.items.select($.result.stdout) %>

I am executing in docker on my mac. The purpose is to vary concurrency, number of items, action runners to understand how to appropriately size our k8s cluster (number of action runners, workflow engines, etc) and so forth. The issue is i am getting wildly varying times running the exact same wf above. Times that vary from 4 seconds to over 21 seconds. I can see (with the st2 tail execution) some actions just seem to stop and take 2-3 seconds, mind you same action in the loop above. I am using top and some other tools but i cannot determine why the WF just hangs periodically. This happens in production K8s cluster as well, so trying to determine how to debug this locally first. Any suggestions appreciated.

punkrokk · February 17, 2020, 7:03pm

Did you ever solve this? I’m interested in what you found.

techdiverdown · February 17, 2020, 11:13pm

Yes, basically the publish, the way we pass variables between actions in workflows, pushes data to mongo. This is using mongo engine, which can be very slow for large, complex structures, which we use. We moved to using redis, so we push data to redis then just pass the redis key around between actions. This greatly reduced the CPU usage and made things much faster and more scalable.

punkrokk · February 18, 2020, 12:41am

Awesome. Would love to read a blog post about how you solved this, as I’m sure others are experiencing this challenge.