This site is in read only mode. Please continue to browse, but replying, likes, and other actions are disabled for now.

⚠️ We've moved!

Hi there!

To reduce project dependency on 3rd party paid services the StackStorm TSC has decided to move the Q/A from this forum to Github Discussions. This will make user experience better integrated with the native Github flow, as well as the questions closer to the community where they can provide answers.

Use 🔗 Github Discussions to ask your questions.

Mistral-api process getting auto-restarted

We have one mistral server and 8 mistral api processes are running on a host, when i restart the mistral process everything runs fine for some hours.
After 8-10 hours mistral api process start getting restarted automatically with the below error in logs:

[2018-08-23 13:37:20 +0000] [3239] [INFO] Worker exiting (pid: 3239)
[2018-08-23 13:37:21 +0000] [3390] [INFO] Booting worker with pid: 3390
[2018-08-23 13:38:49 +0000] [23257] [CRITICAL] WORKER TIMEOUT (pid:3348)

and either at the time of restarting or just after start of mistral-api all workflows execution failed with error code 1 and below is the stack trace:

2018-08-23 18:39:15.289 23325 ERROR mistral.notifiers.default_notifier [req-052991f8-d388-484e-bd88-73c8f30f6788 - - - - -] Unable to process event for publisher "st2".: Exception: [a34f2cd0-75e2-4af0-9dc3-3d878e846345] Unable to publish event because st2 returned status code 401. {
    "faultstring": "Unauthorized"
}
2018-08-23 18:39:15.289 23325 ERROR mistral.notifiers.default_notifier Traceback (most recent call last):
2018-08-23 18:39:15.289 23325 ERROR mistral.notifiers.default_notifier   File "/opt/stackstorm/mistral/lib/python2.7/site-packages/mistral/notifiers/default_notifier.py", line 39, in notify
2018-08-23 18:39:15.289 23325 ERROR mistral.notifiers.default_notifier     publisher.publish(ex_id, data, event, timestamp, **params)
2018-08-23 18:39:15.289 23325 ERROR mistral.notifiers.default_notifier   File "/opt/stackstorm/mistral/lib/python2.7/site-packages/st2mistral/notifiers/stackstorm_notifier.py", line 297, in publish
2018-08-23 18:39:15.289 23325 ERROR mistral.notifiers.default_notifier     func(ex_id, data, event, timestamp, **kwargs)
2018-08-23 18:39:15.289 23325 ERROR mistral.notifiers.default_notifier   File "/opt/stackstorm/mistral/lib/python2.7/site-packages/st2mistral/notifiers/stackstorm_notifier.py", line 145, in on_workflow_status_update
2018-08-23 18:39:15.289 23325 ERROR mistral.notifiers.default_notifier     'status code %s. %s' % (root_id, resp.status_code, resp.text)
2018-08-23 18:39:15.289 23325 ERROR mistral.notifiers.default_notifier Exception: [a34f2cd0-75e2-4af0-9dc3-3d878e846345] Unable to publish event because st2 returned status code 401. {
2018-08-23 18:39:15.289 23325 ERROR mistral.notifiers.default_notifier     "faultstring": "Unauthorized"
2018-08-23 18:39:15.289 23325 ERROR mistral.notifiers.default_notifier }
2018-08-23 18:39:15.289 23325 ERROR mistral.notifiers.default_notifier

Kindly help to find the cause of this.

The 401 error makes me think it might be related to authentication token TTLs.

Have you made any changes to any of the defaults around token TTLs?

no, there is no change in authentication TTL Configuration. The only TTL change is for purging old logs and data.


[api]
# Host and port to bind the API server.
host = <ip>
port = 9101
logging = /etc/st2/logging.api.conf
mask_secrets = True
# allow_origin is required for handling CORS in st2 web UI.
# allow_origin = http://myhost1.example.com:3000,http://myhost2.example.com:3000

[stream]
logging = /etc/st2/logging.stream.conf

[sensorcontainer]
logging = /etc/st2/logging.sensorcontainer.conf

[rulesengine]
logging = /etc/st2/logging.rulesengine.conf

[actionrunner]
logging = /etc/st2/logging.actionrunner.conf
virtualenv_opts = --always-copy


[resultstracker]
query_interval = 1
thread_pool_size = 100
logging = /etc/st2/logging.resultstracker.conf

[notifier]
logging = /etc/st2/logging.notifier.conf

[exporter]
logging = /etc/st2/logging.exporter.conf

[auth]
host = 0.0.0.0
port = 9100
use_ssl = False
debug = False
enable = True
logging = /etc/st2/logging.auth.conf

mode = standalone

# Note: Settings below are only used in "standalone" mode
backend = flat_file
backend_kwargs = {"file_path": "/etc/st2/htpasswd"}

# Base URL to the API endpoint excluding the version (e.g. http://myhost.net:9101/)
api_url =

[system]
base_path = /opt/stackstorm

[webui]
# webui_base_url = https://mywebhost.domain

[syslog]
host = 127.0.0.1
port = 514
facility = local7
protocol = udp

[log]
excludes = requests,paramiko
redirect_stderr = False
mask_secrets = True

[system_user]
user = stanley
ssh_key_file = /home/stanley/.ssh/id_rsa

[messaging]
url = amqp://stackstorm:stackstorm@<rabbitmq_ip>:5672//stackstorm

[ssh_runner]
remote_dir = /tmp

[mistral]
api_url = http://<ip>:9101
v2_base_url = http://<ip>:8989/v2

[coordination]
url = kazoo://<ip>:2181

[garbagecollector]
logging = /etc/st2/logging.garbagecollector.conf
action_executions_ttl = 10
action_executions_output_ttl = 10
trigger_instances_ttl = 1

Do you have very long-running workflows, or items paused for > 24 hours?

nope, Workflows max completes in 4-5 mins.
Just an FYI, We have ~3 months data in Mistral DB and as soon we cleaned up it to 3 days, the error has stopped coming. However it is little early to say it got permanently fixed. We will monitor the same for 1-2 more days and then confirm.