Table of Contents
Tools Required
Roles and Responsibilities on Duty
WhatsApp Template
...
Grafana Dashboards:
Monnify BackOffice - Switching Providers:
ATLAS UI - adding providers
Skype Channels
NALA <> Moniepoint
MONNIFY<> SQUAD
Monnify vs Fidelity Virtual account
Payattitude /Teamapt
Pager duty (Access granted by Simpa Saiki)
GCP access (Log explorer ,Workloads)
WhatsApp groups
VGG
Baxi
Wema-Monnify
Sterling Monnify
Coral pay
TSE Monnify
Monnify operations group
Slack channels
apm-monitoring-alerts
Grafana-monnify-alerts
...
S/N | Panels | Implications | Issues | Threshold | Escalation |
---|---|---|---|---|---|
1 | Disbursement (Rsp Time) | The average response time per transaction from the provider |
| > 4 seconds NIP Success Rate < 94% | TSE |
2 | Pending Disbursements (Total) | It is the count of transactions currently pending and is caused by the following: |
| > 100 transaction count (above 10 mins) | TSE |
3 | Outflow (MPT, Sterling, Wema, Fidelity) | These are monitored because we are integrated to them for “Collections” also. Hence, when there is a downtime on this panel, there will be a downtime on the corresponding “Collections” panels | < 60% | TSE | |
4 | Super Merchant Panels | Baxi, NALA, VGG, Abeg Tech, Palmpay are super merchants that utilize monnify’s disbursement API. | Last Transaction > 1 hour | TSE | |
5. | Balances | These are the disbursement account balances. | Balances < 300mil for | < 300 million | TSE |
6. | ATLAS Providers Success Rates | Transactions are failing . Resolution is to turn on other providers eg (ISW, Habari Pay, ETZ, Hydrogen Pay etc.) | Success rate < 95 -90 % | < 94% Specific Bank on the provider is < 50% | TSE |
7. | Disbursement Performance - By Banks (10m) | Transactions are failing on that specific bank | Bank is encountering technical issues | **Success rate on RED especially for major banks | Send communication to critical stakeholders(monnify operation groups ,TSE) |
...
S/N | Panels | Implications | Issues | Ideal Threshold | Escalation |
---|---|---|---|---|---|
1 | Kafka Retry Queue & Kafka Queue Backlog | Shows the count of posting & settlement entries pending execution | Delayed Job Execution/ Blocked Job Service | > 1,000 (Red) *This threshold should only apply before and after 10pm. Reason: By 10pm, the posting and settlement are being processed hence there might be high frequency
| TSE |
2 | Unsettled OLAM Transactions | These are the volume and value of settlement transactions pending for a merchant (OLAM) | Will be executed when the Kafka Retry Queue has been processed | Will be processed after 10pm | TSE |
3 | MJS - In Progress, Being Processed, & New | These are panels for monnify-job-service | If Job-Services are blocked | MJS -Being Processed > 1,000 | TSE |
4 | Monnify Metabase Replica lag | This is the time-gap between the Monnify-live Database and the Replica | N/A | >60 seconds (monitor the spike before escalating) | TSE and critical stakeholders(DBA) |
5 | Unsent Webhook Notifications | webhook notifications not sent by merchants | webhook notifications not sent by merchants |
| TSE |
OTHERS
Transactions stuck on atlas MJS (Monnify-Job-Service)
At certain times, the queueing system for jobs (atlas-monnify-job-service) on the atlas-service gets clogged due to pending transactions or errors amongst other reasons. Thus affecting disbursements sent from Monnify-disbursement-service to atlas-service. Below are the panels to monitor to get these instances.
...