/ Development  

Archiving finished BPM runtime instances

Hi there AppWorks fans,

Welcome to a new installment of AppWorks tips.

For this week, we’ll dive into the clean-up of old BPM instances. You might already ask yourself questions about this topic. Do you see dramatic numbers in your production environment!? Each started BPM instance in runtime will get a record somewhere in the database (you’ll see!). When you instantiate 25 flows per minute we can make this rough calculation:

  • Nr. of BPM instances in one year: 25 * 525.960 (minutes in one year) = 13.149.000 database entries
  • One BPM instance in the database takes about 2000 chars of data; One character is one byte: 2000 bytes = 2 Kb = 0,002 Mb
  • So, after a year we talk a table size of: 0,002 * 13.149.000 = 26.298 Mb ~ 26 Gb

Interesting to see it in numbers and lucky for you, we’ll do a deep-dive in the cleaning possibilities for these numbers…


Let get right into it…

Spin up your machine and make sure you have a BPM available to try out this post with me. I have a bpm_test available saved in the ‘bpms’ folder of my generic project. Just a clean and simple BPM with one activity. I run it a couple of times (after a save and publication) with the <F12> key directly from the BPM designer panel. The PIM will show us something like this:

bpmarchive_001

Hmmm…the image shows my old test results from a try-out BPM. Trust me; the real test looks the same with the correct process name. This indicates I’m still a human person! 😁

Time for some investigation on where this information storage happens in the database!? I have PostgreSQL running on my RHEL VM which makes it possible to simply grab the ‘Insert’ statements with a quick log_statement setting:

1
2
3
4
5
6
sudo vi /var/lib/pgsql/11/data/postgresql.conf
# Set this value and save it:
log_statement = 'all'

systemctl restart postgresql-11
sudo tail -999f /var/lib/pgsql/11/data/log/postgresql-{day}.log | grep insert

After this setting we see insert statements passing by (when starting new instances…duh!) in these tables process_instance_data and process_instance. Time to have a closer look with HeidiSQL with these select statements:

  • SELECT * FROM process_instance_data
  • SELECT instance_id, process_name, start_time, end_time, description, type, user_name, status FROM process_instance

Looks like we have a clean process_instance_data table after the completion of the BPM; the process_instance still contains valuable information after BPM completion. This is one of my rows:

1
2
3
4
5
6
7
8
instance_id: "0800270e-e5c1-a1ec-ab9c-dca1858128a8"
process_name: "nl-bos/bpms/bpm_test"
start_time: "1648457140797"
end_time: "1648457140896"
description: "bpm_test"
type: "Run from Process Designer"
user_name: "cn=awdev@awp,cn=organizational users,o=appworks_tips,cn=cordys,cn=defaultInst,o=mydomain.com"
status: "COMPLETE"

So, now we know the storage location of our BPM information! How about cleaning it up? For this we have an interesting artifact called ‘Process Archival Manager’ from our own organization space (where we also deployed the BPM!).

bpmarchive_002

Create a new ‘Archive Policy’ like this:

bpmarchive_003

Jump to the ‘Time Line’ tab for this policy and make it ‘Older than 0 Days’. This includes the just completed BPM instances. Save the policy and do your first ‘Manual’ run from the context-menu. When done, hit that refresh icon and see the result:

bpmarchive_004

You can double-check the PIM on this one and also the exposed database table!

At the moment, I have four cleaned flows. Click the ‘blue’ link in the ‘Archive Log View’ panel to download the archived ZIP-file!

What is available in this file:

  • log.xml: Archive info with AWP data source information, and the next three related files
  • log-{transaction_id}: Contains the running flow instance IDs of the BPM (template/model)
  • {transaction_id}: Contains more details on the flow instances (incl. the activities, variables, message map, and owners)
  • DesignTimeDataCopy_{transaction_id}: Contains the BPM template/model information itself

Back to the properties of our policy again and jump into the ‘Schedule’ tab where we can set several schedule-types. Like a bi-daily one:

bpmarchive_005

The schedule is inactive by default! From the context menu you can make it ‘Activated’. Will it run at that moment in time? This depends on the ‘Scheduling’ service container in the ‘System’ space…Have a look and make sure it’s running, so it can pick up our defined schedule.

You can even open the ‘Schedule Manager’ artifact to view the schedule ID but do not undeploy it from this artifact! Always use the ‘Process Archival Manager’ for this action! That’s just my experience.


Open questions

Time to get an answer on some interesting questions I asked myself during the above exploration:

Where is the ZIP file eventually stored?

I see that one BPM entry makes the ZIP 5Kb (my four instances give a ZIP of 6Kb). So, that makes one entry roughly 1 / 3 ~ 0,33Kb! This is smaller than the previous database entry, but where is the ZIP stored and will it be removed too?

A little Chrome developer-console skills on the archivemanager.caf file brings me to a function called function showLink(fileObject) {...}. This function builds a download URL like this: http://192.168.56.107:8080/wcpdev/cpc/admin/archive/Clean completed_2022-03-29_14.43.53.4353.zip. From experience, we all know it corresponds with this file location on the server: /opt/opentext/AppWorksPlatform/defaultInst/webroot/wcpdev/cpc/admin/archive/Clean completed_2022-03-29_14.43.53.4353.zip

So, a directory to monitor: sudo ls -ltr /opt/opentext/AppWorksPlatform/defaultInst/webroot/wcpdev/cpc/admin/archive/

For your information on the CAF file:

Is the file removed when we delete the log view entry from the ‘Process Archival Manager’ perspective? Trust me…It’s indeed removed from that file location on the server! 😅

What can we do with this ZIP?

A valid question!? Only I don’t have a direct answer…Archive it externally to a larger data-lake for further AI-analytics? On the other hand, we can also back up the process_instance table from a database perspective and clean up the table manually!? It’s all up to you, and you’re not the only one!

An interesting call from PostgreSQL (just for my own registration):

1
pg_dump --host localhost --port 5432 --username postgres --format plain --verbose --file "/home/sysadmin/process_instance.dump" --table public.process_instance appworks_db`

This is my dump for the same BPM instances information.


THERE IS MORE (an after-update on my own post!)

After some sleepless nights I found the holy grail on those ZIP files! Or I just #RTFM (Chapter 19 of the admin guide!) 🤗

How about view/restoring those archived process instances! Is this possible? Ohw yeah…it’s possible! Let’s have a look at the artifact called ‘Archive Administration’ within our own organization. You will start with an unavailable service group error to manage requests for this artifact. To fix this, we quickly create a new (never used from my standpoint) service container within the ‘System Resource Manager’ using this input:

  • Type: Archive Administration
  • Group name: archive_admin_service_group
  • At “the one” service interface available
  • Service name: archive_admin_service_container
  • Startup automagically
  • Assign to OS process (recommended by OT)
  • Connect the ‘Cordys System’ database from the ‘System’ space/organization

Once the service container is up and running, you get a second error from the ‘Archive Administration’. This time about unavailable tables in our database (the one selected in our service container). Time to add those tables through psql:

1
2
3
4
5
sudo -u postgres psql
\c appworks_db
\i /opt/opentext/AppWorksPlatform/defaultInst/components/archiveframework/database/createscripts/BPM_CASE_ARCHIVAL_POSTGRES.sql
\i /opt/opentext/AppWorksPlatform/defaultInst/components/archiveframework/database/createscripts/NOTIFICATION_ARCHIVAL_POSTGRES.sql
\q

One of the created tables in these scripts is process_instance_arc which we would like to monitor from a HeidiSQL perspective via a select statement.

Back to the ‘Archive Administration’ artifact where we can now import our ZIP-file to reload the archived process instances:

bpmarchive_006

During my own exploration I see a strange behavior as the loading of the ZIP file inserts entries back into the process_instance table. I would expect to have insert statements on the process_instance_arc table!? A bug? My environment? To solve this I copied the rows from the process_instance table to the process_instance_arc table. Both tables have the same column definitions!

Eventually you’ll get an overview of archived process instances from the ‘View archived instances’ tab where you can also apply a filter to find the correct information about your process instance:

bpmarchive_007

NICEEE! 😎

A final thing to know is the archive webservices availability through the webservice-interface-runtime-reference (scrabble?) for Method Set Archive Administration (categorized under the package ‘Cordys Business Process Engine’).

bpmarchive_008

This runtime reference exposes webservices with SOAP messages like this (cleaned from namespace declarations):

1
2
3
4
5
6
7
8
9
10
11
12
<SOAP:Envelope>
<SOAP:Body>
<GetArchivedProcessInstanceSummary>
<ns0:Query>
<ns0:Select distinct="">
<ns0:QueryableObject alias="">InstancesSummary</ns0:QueryableObject>
<ns0:Field alias="" function="">ProcessName</ns0:Field>
</ns0:Select>
</ns0:Query>
</GetArchivedProcessInstanceSummary>
</SOAP:Body>
</SOAP:Envelope>

Or another example like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
<SOAP:Envelope>
<SOAP:Body>
<GetArchivedProcessInstances>
<ns0:Query>
<ns0:Select distinct="">
<ns0:QueryableObject>PROCESS_INSTANCE</ns0:QueryableObject>
<ns0:Field>INSTANCE_ID</ns0:Field>
<ns0:Field>PROCESS_NAME</ns0:Field>
<ns0:Field>START_TIME</ns0:Field>
<ns0:Field>STATUS</ns0:Field>
</ns0:Select>
<ns0:Filters>
<ns0:And>
<ns0:EQ field="PROCESS_NAME">
<ns0:Value>nl-bos/bpms/bpm_test</ns0:Value>
</ns0:EQ>
<ns0:In field="STATUS">
<ns0:Value>COMPLETE</ns0:Value>
</ns0:In>
</ns0:And>
</ns0:Filters>
<ns0:OrderBy>
<ns0:Property direction="desc">START_TIME</ns0:Property>
<ns0:Property direction="desc">INSTANCE_ID</ns0:Property>
</ns0:OrderBy>
</ns0:Query>
</GetArchivedProcessInstances>
</SOAP:Body>
</SOAP:Envelope>

That’s it my friends…We’ll wrap it up with a “DONE”! Always nice to see how things work in the back-end. Again an interesting dive at database- and filestore-level. Share your thoughts in the comments, and I see you in the next one…

Don’t forget to subscribe to get updates on the activities happening on this site. Have you noticed the quiz where you find out if you are also “The AppWorks guy”?