Table of Contents
...
Grafana Dashboards:
Monnify BackOffice - Switching Providers:
ATLAS UI - adding providers
Skype Channels
NALA <> Moniepoint
MONNIFY<> SQUAD
Monnify vs Fidelity Virtual account
Payattitude /Teamapt
Pager duty (Access granted by Simpa Saiki)
GCP access (Log explorer ,Workloads)
WhatsApp groups
VGG
Baxi
Wema-Monnify
Sterling Monnify
Coral pay
TSE Monnify
Monnify operations group
Slack channels
apm-monitoring-alerts
Grafana-monnify-alerts
...
S/N | Panels | Implications | Issues | Threshold | Escalation |
---|---|---|---|---|---|
1 | Disbursement (Rsp Time) | The average response time per transaction from the provider |
| > 4 seconds NIP Success Rate < 94% | TSE |
2 | Pending Disbursements (Total) | It is the count of transactions currently pending and is caused by the following: |
| > 100 transaction count (above 10 mins) | TSE |
3 | Outflow (MPT, Sterling, Wema, Fidelity) | These are monitored because we are integrated to them for “Collections” also. Hence, when there is a downtime on this panel, there will be a downtime on the corresponding “Collections” panels | < 60% | TSE | |
4 | Super Merchant Panels | Baxi, NALA, VGG, Abeg Tech, Palmpay are super merchants that utilize monnify’s disbursement API. | Last Transaction > 1 hour | TSE | |
5. | Balances | These are the disbursement account balances. | Balances < 300mil for | < 300 million | TSE |
6. | ATLAS Providers Success Rates | Transactions are failing . Resolution is to turn on other providers eg (ISW, Habari Pay, ETZ, Hydrogen Pay etc.) | Success rate < 95 -90 % | < 94% Specific Bank on the provider is < 50% | TSE |
7. | Disbursement Performance - By Banks (10m) | Transactions are failing on that specific bank | Bank is encountering technical issues | **Success rate on RED especially for major banks | Send communication to critical stakeholders(monnify operation groups ,TSE) |
View the failure reasons section for reason of failures
How to check failure reason by bank on metabase
SELECT created_on,transaction_reference,transaction_status,bank_name,provider_reference,provider_response_message
FROM disbursement_transaction
WHERE transaction_status in ('failed')
and bank_name = "Ecobank Nigeria Plc"
AND created_on < DATE_ADD(DATE_SUB(NOW(), INTERVAL 3 minute), INTERVAL 1 HOUR)
order by created_on DESC
COLLECTION
For each provider, it is required to monitor and review the transaction notifications received per bank provider and ensure we are getting traffic as required. At any point where the performance drops, it is required to reach out to the provider to address the issue promptly.
...
S/N | Panels | Implications | Issues | Ideal Threshold | Escalation |
---|---|---|---|---|---|
1 | Kafka Retry Queue & Kafka Queue Backlog | Shows the count of posting & settlement entries pending execution | Delayed Job Execution/ Blocked Job Service | > 1,000 (Red) *This threshold should only apply before and after 10pm. Reason: By 10pm, the posting and settlement are being processed hence there might be high frequency
| TSE |
2 | Unsettled OLAM Transactions | These are the volume and value of settlement transactions pending for a merchant (OLAM) | Will be executed when the Kafka Retry Queue has been processed | Will be processed after 10pm | TSE |
3 | MJS - In Progress, Being Processed, & New | These are panels for monnify-job-service | If Job-Services are blocked | MJS -Being Processed > 1,000 | TSE |
4 | Monnify Metabase Replica lag | This is the time-gap between the Monnify-live Database and the Replica | N/A | >60 seconds (monitor the spike before escalating) | TSE and critical stakeholders(DBA) |
5 | Unsent Webhook Notifications | webhook notifications not sent by merchants | webhook notifications not sent by merchants |
| TSE |
OTHERS
Transactions stuck on atlas MJS (Monnify-Job-Service)
At certain times, the queueing system for jobs (atlas-monnify-job-service) on the atlas-service gets clogged due to pending transactions or errors amongst other reasons. Thus affecting disbursements sent from Monnify-disbursement-service to atlas-service. Below are the panels to monitor to get these instances.
...