--- title: source: author: - Shen Wei published: created: description: tags: --- ## Introduction The main purpose of this document is to help non-Cloud Ops team members better understand the various services and tools currently provided for Cloud Application troubleshooting, so that they can be used flexibly in different scenarios and reduce dependence on Cloud Ops engineers. Our goal is also very clear. We hope to provide a more efficient DevOps ecosystem to provide better services to our customers. **Please note that the various services and tools mentioned below require approval and authorization, and are currently limited to members of the Cloud Ops and R&D CPE teams** ## Troubleshooting as a Service ### Access Environment as a Service #### Access to Customer Tenant We provide a method to enter the customer's tenant so that when doing troubleshooting, you can directly access the customer's environment to check the problem and understand the symptoms of the problem at the first time, so as to make the right judgment. #### Access to ESM Farm BO, IDM, UCMDB JMX console We provide a method to apply for temporary user access to each farm management console - BO Suite Admin - ESM IDM Admin - UCMDB Super Admin to UCMDB JMX Console ### Log Collection as a Service We provide a very comprehensive log collection automation tool. Collect log information of a specific module within a specific time period. Users can select appropriate filtering conditions to collect logs according to different scenarios, so as to locate problems more accurately and reduce extra effort caused by excessive log size. ### Check Configuration ### Monitoring as a Service #### Unified Monitoring via pre-defined Grafana Dashboard We provide a lot of rich implementation monitoring data for various troubleshooting. Currently we use Grafana as the monitoring UI to reflect the monitoring data of farm implementation: - AWS Cloud Watch Data Source - Able to have real-time infrastructure monitoring (AWS EKS/EFS/RDS) - Prometheus Data Source - Able to check real-time application level metrics exposed by Prometheus - Database query Data Source - Get some key indicators of the application through database query - Containerize/K8S - Able to monitor the key monitoring data of the containerize product, container/node/pod etc. #### Service Availability Health Page ### Log Analysis as a Service ### BI Reporting as a Service ### Unplanned Change Request as a Service ### Other Services