We have a a long running task which can take up to 8 days to complete. Most of our executions complete successfully but we run into an issue where all of our executions start failing. Upon investigating the logs, we see that we are receiving HTTP errors due to an expired token. The TTL for our auth tokens in our st2.conf is set to 10 days but we are seeing this issue for task running less than 4 days.
Mistral log snippet:
2018-10-08 12:32:40.571 497 WARNING st2mistral.utils.http [req-315de175-21ab-40c5-ba20-70d95798be0e - - - - -] [stackstorm] HTTP request returned connection error. Retrying...: ConnectionError: ('Connection aborted.', BadStatusLine("''",))
st2 log snippet:
/var/log/st2/st2api.log:72022064:2018-10-08 12:32:04,844 140599707325296 INFO logging [-] e2cf5d4b-a847-4413-8bf5-cb3e9fc3a95a - POST /v1/executions with query={} (remote_addr='127.0.0.1',method='POST',request_id='e2cf5d4b-a847-4413-8bf5-cb3e9fc3a95a',query={},path='/v1/executions')
/var/log/st2/st2api.log:72022065:2018-10-08 12:32:04,847 140599707325296 AUDIT auth [-] Token with id "5ba5c9482b79db008f58c297" has expired.
/var/log/st2/st2api.log:72022066:2018-10-08 12:32:04,848 140599707325296 ERROR router [-] Token has expired.
/var/log/st2/st2api.log:72022073:2018-10-08 12:32:04,849 140599707325296 INFO logging [-] e2cf5d4b-a847-4413-8bf5-cb3e9fc3a95a - 401 58 5.014ms (content_length=58,request_id='e2cf5d4b-a847-4413-8bf5-cb3e9fc3a95a',runtime=5.014,remote_addr='127.0.0.1',status=401,method='POST',path='/v1/executions')
/var/log/st2/st2api.log:72022074:2018-10-08 12:32:09,596 140599707315728 INFO logging [-] 030974bc-5181-491b-b8ca-8bef57aea37d - POST /v1/executions with query={} (remote_addr='127.0.0.1',method='POST',request_id='030974bc-5181-491b-b8ca-8bef57aea37d',query={},path='/v1/executions')
/var/log/st2/st2api.log:72022075:2018-10-08 12:32:09,599 140599707315728 AUDIT auth [-] Token with id "5ba4e4492b79db009206d556" has expired.
/var/log/st2/st2api.log:72022076:2018-10-08 12:32:09,599 140599707315728 ERROR router [-] Token has expired.
/var/log/st2/st2api.log:72022083:2018-10-08 12:32:09,600 140599707315728 INFO logging [-] 030974bc-5181-491b-b8ca-8bef57aea37d - 401 58 4.493ms (content_length=58,request_id='030974bc-5181-491b-b8ca-8bef57aea37d',runtime=4.493,remote_addr='127.0.0.1',status=401,method='POST',path='/v1/executions')
/var/log/st2/st2api.log:72022084:2018-10-08 12:32:18,950 140599706935856 INFO logging [-] ffd0f7b1-a62b-4cd2-b014-cef078e6113d - POST /v1/executions with query={} (remote_addr='127.0.0.1',method='POST',request_id='ffd0f7b1-a62b-4cd2-b014-cef078e6113d',query={},path='/v1/executions')
/var/log/st2/st2api.log:72022085:2018-10-08 12:32:18,953 140599706935856 AUDIT auth [-] Token with id "5ba884d92b79db00a3f15097" has expired.
/var/log/st2/st2api.log:72022086:2018-10-08 12:32:18,953 140599706935856 ERROR router [-] Token has expired.
st2 conf snippet:
api_url =http://127.0.0.1:9101
token_ttl = 864000
service_token_ttl = 864000
We are currently running a single-box deployment with st2auth enabled. The workaround we currently have for this is to restart all the running services then kicking off all the stuck workflows manually.
Any help is gladly appreciated. Thank you!