How to pull all running actions


(Deepchandna) #1

Is there way we can pull all the running actions only? I want to cancel all the running actions at once which are 24 hours old.

As an alternative,
I did cancel them directly in Mongo db but StackStorm GUI still shows them running and when i kill them manually on GUI after cancelling them MONGODB, it shows below error -

Unable to cancel execution “5b77b0c83648f811916c9288”. Execution object missing link to liveaction 5b77b0c83648f811916c9287.


(Tomaz Muraus) #2

You can retrieve all the running executions which have been started between a specific time period using the following command:

st2 execution list --status=running --timestamp-gt=<now - 48 hours in ISO FORMAT> --timestamp-lt=<now - 24 hours in ISO format>
# ISO date time format uses the follow notation: 2000-01-01T12:00:00.000Z

This will return all the executions in the running state which are more than 24 hours but less than 48 hours old (aka day old executions).

And then cancel them using st2 execution cancel <execution id> command:

st2 execution cancel <id 1> <id 2> <id n>...

This can of course all be scripted either by using the CLI directly or by using the Python client library to talk to the API.

Manipulating MongoDB directly is not always the best idea if you don’t know exactly what you are doing, because it can cause database level inconsistency if you don’t remove and clean up all the related objects.


(Deepchandna) #3

@kami, Thanks much. Your suggestion worked. Can you suggest on below as well -

I am running script on actions i killed from mongo DB manually but they are still alive at stackstorm side. How can i kill them ? Below is the example -

$ ./RunningActions.sh
5b776f5b3648f811916c8bfe
ERROR: 500 Server Error: Internal Server Error
MESSAGE: Execution object missing link to liveaction 5b776f5b3648f811916c8bfd. for url: http://127.0.0.1:9101/v1/executions/5b776f5b3648f811916c8bfe


(Tomaz Muraus) #4

It looks like this error happens because you manually deleted objects from the database and the database ended up in an inconsistent state.

It looks like you deleted action execution objects, but not the related liveaction objects. Only way to rectify is to also manually delete orphan liveaction objects from the database.

Going forward (as mentioned above), I would suggest you to use “st2 execution cancel” to cancel such executions. If you will be manually deleting some objects, but not all the related ones, you can end up in a inconsistent state like that.


(Deepchandna) #5

thanks . Will look into it.


(Deepchandna) #6

@kami, I am also facing one more issue - Transactions are always in running state and when i check “mistral-server.log” file - Below is the error -

2018-12-31 01:33:50.169 95642 ERROR mistral.notifiers.default_notifier Exception: [fa9792ca-9328-475a-8ec3-a66e7b17efbe] Unable to publish event because st2 returned status code 500. {
2018-12-31 01:33:50.169 95642 ERROR mistral.notifiers.default_notifier     "faultstring": "Internal Server Error"
2018-12-31 01:33:50.169 95642 ERROR mistral.notifiers.default_notifier }
2018-12-31 01:33:50.169 95642 ERROR mistral.notifiers.default_notifier

I restart all the stackstorm services to fix this issue and it happens again in 2-3 days.

I also observed, this issue is happening to my RHEL7 servers not RHEL6 servers. Workflows never get stuck on StackStorm installed RHEL6 servers.

hostname : server1
$ st2 --version
st2 2.7.1, on Python 2.7
$
OS - LINUX
Release - RHEL6


hostname : server2
$ st2 --version
st2 2.7.1, on Python 2.7
$
OS - LINUX
Release - RHEL7

Please suggest the solution.


(Tomaz Muraus) #7

It looks like this could also be related to the “inconsistent database” issue in case those Mistral tasks are referring to the executions you have manipulated.

Having said that - we would need to see st2api.log (/var/log/st2/st2api.log) to see why it’s returning 500 when Mistral tries to submit a result back to StackStorm API using the callback approach.


(Deepchandna) #8

You are correct. Issue seems to be with st2api. I see below error during execution and when i just restarted only the st2api, I got fixed.

2018-12-31 02:32:33,855 140319692467248 AUDIT auth [-] Token with id "5c29d3ed04fa2be29c59fc53" is validated.
2018-12-31 02:32:33,859 140319692467248 ERROR error_handling [-] API call failed: [Errno 2] No such file or directory: '/tmp/tmpPm0ad5'
Traceback (most recent call last):
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/st2common/middleware/error_handling.py", line 47, in __call__
    return self.app(environ, start_response)
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/st2common/middleware/streaming.py", line 48, in __call__
    return self.app(environ, start_response)
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/st2common/router.py", line 519, in as_wsgi
    resp = self(req)
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/st2common/router.py", line 350, in __call__
    if not req.body:
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/webob/request.py", line 702, in body
    self.make_body_seekable() # we need this to have content_length
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/webob/request.py", line 941, in make_body_seekable
    self.copy_body()
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/webob/request.py", line 991, in copy_body
    fileobj = self.make_tempfile()
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/webob/request.py", line 1031, in make_tempfile
    return tempfile.TemporaryFile()
  File "/usr/lib64/python2.7/tempfile.py", line 489, in TemporaryFile
    (fd, name) = _mkstemp_inner(dir, prefix, suffix, flags)
  File "/usr/lib64/python2.7/tempfile.py", line 239, in _mkstemp_inner
    fd = _os.open(file, flags, 0600)
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/eventlet/green/os.py", line 109, in open
    fd = __original_open__(file, flags, mode)
OSError: [Errno 2] No such file or directory: '/tmp/tmpPm0ad5' (_exception_data={},_exception_class='OSError',_exception_message="[Errno 2] No such file or directory: '/tmp/tmpPm0ad5'")
2018-12-31 02:32:33,860 140319692467248 INFO logging [-] 5d2d6374-bd48-40f7-9eba-3c9a331f25b5 - 500 46 7.507ms (content_length=46,request_id='5d2d6374-bd48-40f7-9eba-3c9a331f25b5',runtime=7.507,remote_addr='127.0.0.1',status=500,method='PUT',path='/v1/executions/5c29d3ed04fa2b2cbd61b910')

What should be the path forward now?


(Tomaz Muraus) #9

I haven’t seen this error before, but it doesn’t look like it’s database inconsistencies related.

It looks like it’s related to gunicorn file based locking (for some reason the lock file doesn’t exist when it tries to delete / release it). Did you change any st2api gunicorn settings like the number of workers / threads or similar?

Also, how did you install StackStorm - installer script? Is this a VM / server or a container? If it’s a container, it could be something related to container /tmp isolation and permissions.


(Deepchandna) #10

It looks like it’s related to gunicorn file based locking (for some reason the lock file doesn’t exist when it tries to delete / release it). Did you change any st2api gunicorn settings like the number of workers / threads or similar?
Never changed any st2api gunicorn settings.

Also, how did you install StackStorm - installer script?
RPM packages.

Is this a VM / server or a container?
VM.

If it’s a container, it could be something related to container /tmp isolation and permissions.
No. It is a VM.

Just FYI, this issue is not with RHEL6 servers.