Preparing the Monthly BOMB Report
Please note that the bomb report is a summary of the Executive Summary Reports in the SAAZ Portal, it will probably take between 1-2 hours PER BOMB REPORT to complete correctly. The bomb report is an intelligent human’s summary of the entire computer generated data. You will be going through reams and reams of reports to actually compile the nicely summarized bomb report.
We call it the bomb report because the clients only have to look for bombs on the page and realize “Bombs are Bad”. So if there are no bombs, there is no risk to the client. This hardly ever happens though.
We also use this report to prove to the client that we are being proactive and offering value to them, so this report is of immense importance in keeping our MSP clients paying.
Also, I need to read the bomb report and UNDERSTAND what I need to discuss with the clients. Issues they may have and things that could be done better etc.
Finally, the report is a chance for us to remediate issues that you may pick up and report back on how it was fixed. Let us use Envirowaste as an example.
In the following pages we are going to cover a lot of reports that you need to go through and a lot of test that you need to perform. All these reports are computer generated. We need your intelligence to go through the reports, comment on the issues observed and set a plan in action to remediate the problems, either by sorting them out yourself or by assigning calls and tasks to people who can.
Server
The first thing that you have a look at in the portal is the quick access setting on the dashboard for the servers and the workstations. As you can see the most important things for the server is OK, Disk Space, Antivirus and Security updates. What would we do if they were not OK? We would start an immediate chat with online support to fix it for us. This way they could go on with this while we finish the report. Please always remember to check the ticket and resolution from the NOC.
We have a critical user impact alert; if you click on the Red Cross you will see the following: So the server has restarted 14 days ago. Why? I don’t know. This is water under the bridge.
Next problem is the Critical Non Impact alert. Click on the red cross and you will see:
Once the bomb report is done there should be no red crosses left!A Paging file operation. This is more important. Please go through the event log and AT tickets to see if this is a recurring issue. Please make some recommendations. You will see that the NOC give you some suggestions on how to fix the problem. Follow the suggestions. If you do not come right, assign it to the NOC to fix. Next, let’s check out the server’s backups. Take remote control of the server.
You’ll be asked for a username and a password for the server. You will find this in the NIF.
Immediately we can see that there is a problem with this server’s memory utilization. It only has 4GB and all of it is being used. Make a mental note of this. Click on Remote Control.
Open the SBS Console and click on Backup and Server Storage. You will see that the backup has failed. Reason? The backup drives are offline. Please log a call on Zendesk and email everyone at the office. This is unfortunate as we need to test a restore (just one or 2 files) to make sure that it is working.
Please note that we use different backup software at different sites. This should be in the NIF. If it isn’t, please let us know so that we can update it.
Desktops
Now let’s have a look at the Desktops.
There are a couple of problems here. Some PC’s Anti-virus signatures are not updating or do not have Anti-virus at all. Chris-XP is one of these. You can see that the PC is online and that there is no Anti-virus installed on it. Big Problem. Log a call for a technician to sort this out ASAP. Send a mail to all the technicians about this and CC Arno.
Secondly, Eliias1 does not have Anti-Virus. His PC is offline and the AV is not installed so it could be that this PC has been decommissioned. To check this, go to Configuration->House Keeping->No Contact(Desktop)->Site Name->Change the days to 30 and click on the magnifying glass.
You will now receive a report of all PC’s that have not contacted the Portal for the past 30 days. It is safe to assume that these PC’s are not on the network anymore. ELIAS1 is not in the list so it has contacted the Portal. Even if you change the days to 15, it still isn’t in the list so it has contacted the NOC in the last 15 days. Could it be that Elias is sick or that a virus infected his PC and screwed it up? Either way, log a call for a technician to phone the user and find out.
The yellow dots under Anti-virus means that this PC is running an unsupported or freeware Anti-virus. Please see if there is not a current call open for this. If not, log a call for this and send an email to everyone to have a look if there is a reason for this.
If there are any other problems such as the Patches not being rolled out or Smart HDD errors or free disk space is a problem etc a call needs to be logged and everyone in the team needs to be emailed. Always make sure that there is not a current open call for this.
Now go to S&CC->Desktop Monitoring Script Dashboard->And select the PC’s with exceptions thrown. The exceptions are for:
Memory Available less than 24 MB Memory Monitoring (Free MB) 220 134 /As you can see there are 9 PC’s whose CPU utilization is running at more than 90% and 8 PC’s who have less than 5% free disk space. Let’s click on it to see if any of these PC’s belongs to Envirowaste.
We’re lucky; none of the PC’s at Envirowaste is taking strain. If there was a problem, you would need to log a call and email everyone in the Team. Please make sure that there isn’t a current call for this issue that is still open. If there is, escalate the call to the dispatch person. We would probably phone the user and ask them if we could have a look at their PC’s and then assign the call back to you to take over the PC and sort out the problem.
Network
The Network report is generated by doing a Speedtest on the server and also looking at the firewall log.
Please take note that we have various ISP’s. Telkom (as in this case) should be the unshaped bandwidth and failrly quick. Some clients can go up to 10 MBps but anything under 2 MBps is cause for concern and needs to be looked at. Upload speeds of less than 0.25 MBps is also a problem. Due to the fact that these clients use RDP to access their apps, there could be performance issues (40 kbps per session; max 6 sessions on this connection). Do the speedtest a couple of times to get an average.
If possible also do a pingtest (www.pingtest.net). You need Java enabled on the browser for this to work.
Do a couple of tests to get an average. Anything under a B is cause for concern and could influence network performance, especially VOIP apps like Skype as well as RDP sessions.
Disaster Planning
Simply put “No DRP Plan in Place”. We’ll complete this.
Security
Now let’s have a quick look at the firewall Check Firewall reports. We go through these reports to see if any users are abusing the system, if there are torrents being downloaded that hog the bandwidth etc. I am going to use Detect as an example as they use a lot of the advanced functionality found in the Untangle server.
Have a look at the WAN failover. Not all sites use this functionality. WAN Failover allows the Untangle server to failover to another WAN connection such as a wireless connection if the primary connection fails. If the site has this functionality, have a look if this is working first. You will see that there are 2 connections available. This implies that it is working. Click on “Settings” on the WAN Failover Module.
As you can see the External (Primary) link is up 91.6% (not very good, that is why we got the failover) and the DMZ or secondary connection is up only 53.7% of the time. This is really bad and not a very good failover solution. The client should consider changing the failover link due to its unreliability.
Close the Wan Failover Module and Click on the Reports Module’s Settings.
Click on “View Reports”
Get the whole month’s (30 days) report.
Copy all the data in the large red block. This will go directly into our Bomb Report. Now let’s have a look at some interesting figures. I will highlight the important data.
Platform scanned 33.30 GB and 1355346 sessions Spam Blocker scanned 8027 messages and detected and processed 2889 spam messages Phish Blocker scanned 8027 messages and detected and processed 21 phish messages Spyware Blocker scanned 126005 web hits and blocked 2597 activities Bandwidth Control analyzed 33304.83 MB WAN Failover detected 348 WAN failures and saved the network from 249135.8 seconds of downtime Virus Blocker scanned 134676 documents and detected and blocked 358 viruses Intrusion Prevention scanned 556524 sessions and detected 0 attacks of which 0 were blocked Protocol Control scanned 556524 sessions and detected 167375 protocols of which 1512 were blocked Firewall scanned 556524 sessions and blocked 0 according to the rulesThe most important stat to me is that WAN failover saved the company more than 249135 seconds (more than 70 hours) worth of downtime. Make a mention of this in the Bomb Report. Also mention any other facts that you may find interesting, such as the Spam messages that were blocked and viruses that were blocked etc.
Let’s quickly have a look if any of the users have been abusing the system. Click on Protocol Control and have a look if there are any weird protocols being used. At Detect someone is using a Bittorrent Client. This could cause huge amounts of traffic on the network. Make a note of this for the Bomb Report.
Finally go out of the reports section and click on Configà Email
Note that there are users that have large amounts of mails in their quarantine folder. They probably do not know how to empty their quarantine. Make a note of this.
General Reports
There are a couple of General Reports that need to be saved as well. I am using Detect as an example for these reports. I have found that these reports work better in Internet Explorer as there are some custom settings that you have to put in to make it work. I do not have documentation for Chrome or Mozilla.
In Internet Explorer, Click on Tools->Internet Options. Select Trusted Sites and Click on the Sites Button.
Make sure the following sites are added to the list of trusted sites
Also make sure that popup blocker is not blocking these sites.
In the Portal click on “Reports”
Select “Executive Summary”
Select the Word Document for the site as you may want to edit this Document Slightly.
The Executive Summary report is a couple of pages that we give to clients/executives so that they do not have to wade through reams of reports. Go through the report and format it so that everything fits nicely (resize graphics if they overflow to other pages etc). An example of the report is available to download here. My comments will be highlighted.
Original Article Posted at http://www.msppractice.com/