37 Discuss the basics of troubleshooting.
As a network Administrator a lot of your time would go in troubleshooting the network. The problem demanding the attention of the administrator may be as small as something like the printer not printing or something major that the server has failed and the working of the entire network is affected. Trouble shooting is not only fixation of the problem but rather isolation of the problem and taking preventive and corrective actions. The abilities of the administrator should not only be limited to technical knowledge but should also include the ability to look at the problem creatively and reach the source of the same. Communications Skills adds to the abilities of the e Network Administrator. Experience in handling networks and troubleshooting adds to the value attached to an administrator. Troubleshooting is a tough job to handle as there are so many variables attached to the same. Catching the problem is the first step that an administrator can take towards troubleshooting.
Troubleshooting a server is different from troubleshooting a network. The principles of troubleshooting essentially remain the same but the steps that can be undertaken for resolution of the same are different. The basic differences are:
Server |
Workstation |
|
Pressure |
Troubleshooting a server means hundreds and even thousands of anxious users. The pressure that the network administrators feel is immense. |
Troubleshooting a network means a single workstation with a single anxious user. The pressure is no doubt there, but the quantum of the same is not as high as in the case of a server. |
Planning |
It is not easy to work on a server. Days, Weeks or even months of planning may be involved. |
Working on a work station does not require much work. The work can be done during lunch break or after work. |
Time |
There are organizations for which non-availability of the server means loss of revenue. These are expected to be in working order 24x7. |
Loss of time for a workstation does mean loss but the stakes are not as high as in the case of a server. |
Problem Determination |
In the case of servers hit and trail method are not appreciated. The administrator is expected to have specific answers for specific problems. |
The problem can be easily determined following hit and trial method or even swapping. |
Expertise |
The people who manage the servers are expected to have specialized knowledge for the software as well as the hardware. |
The demand of expertise is not so strict in the case of workstations. |
Table 10: Differences between Troubleshooting a Server and Troubleshooting a
Network
The other considerations that play an important role while troubleshooting a workstation as well as a server are:
Time: When a network administrator is concerned about time, he is considering whether it is a rush hour or relatively an easy go hour. During a high use period, the solution would be a quick fix solution, which would get the systems working quickly. Finding the exact problem and an exact solution can be time consuming.
Network Size: The strategy required to deal with troubleshooting of network having ten systems would be different from a network with hundred systems.
Support: One of the major concerns of a network administrator is that how many professionals are in his department. Does the support he has exist only in the form of the hardware and the manufacturer or there are more people to consult and report to? The number of personnel in this department would actually depend on the size of the organization.
Knowledge of the Network: It is important that the network administrator before he starts troubleshooting acquire knowledge about the layout of the network and the topologies. The strategies to be adopted would be controlled by these factors largely.
Technologies Used: It is important to acquire knowledge of the various technologies collaborated to set up a network. It is always better for an administrator to acknowledge inability to handle a particular problem rather than fiddle with a system and add to the problems.
Steps to be followed for Troubleshooting: To troubleshoot a network following the steps and the procedure carefully is an absolute essential. The general steps to be followed while troubleshooting are:
The following sections examine each area of the troubleshooting process.
Step 1: Information Gathering: Identify Symptoms and Problems
Troubleshooting with limited information increases the hardship of the job. It could result in working on the wrong problem than the problem itself. The first step that should be undertaken is to acquire as much knowledge as possible of the system, as the network, the problem, and the symptoms of the problem. Good communication skills and patience go a long way in acquiring this knowledge. Information can be gathered from three sources:
Information from the Computer: A computer can go a long way in helping the administrator identify what and where the problem is. What the error message stand for in particular operating systems should be known. Information may not always be in a readable form, it may be in cryptic form. The administrator should know how to decrypt the same. Systems can also be configured to generate log files after hardware and a software failure. These files can be viewed to see what, where, when and how the error occurred.
Information from the User: Communication skills play an important role while deriving information from the end users. The users are people with limited technical knowledge, explaining the exact problem, understanding technical jargon may not be easy for them. Interview the end user and be a patient and an attentive listener. An administrator would ask questions revolving around: o Frequency of Error;
- Applications being used at the time the problem arose; o Any past problems the system has been facing;
- Any modifications made by the user. The modification could be something as small as a new screensaver or a new game that has been installed;
- Error Messages that the users encounter in the absence of the administrator.
Observation Techniques: Keen eyes, keen ears and a keen nose can often help detect the problem very easily and quickly. Observation technique works best for connectivity errors. Unplugged cables, clouds from the back of the system do not need error messages to convey the problems.
Step 2: Identify the Affected Areas of the Network
It is important to find out the exact area that is affected by the problem. The problem could persist over a single location, or at multiple locations. The extent over which the problem persists would define the strategy to be followed while troubleshooting. Problems affecting single users normally revolve around the workstation.
Step 3: Determine if Anything Has Changed
It is always a working system that will face problems. A problem may exist at the level of a workstation or the entire network. The likelihood of the system or the network facing problems because of some changes that have been made to them is very high. Computers can hardly do anything on their own. It is human intervention which makes them work and the same that makes them come to a stand still. IT is important to find out what changes were made to the system to ascertain the exact problem for troubleshooting. The changes could take place at the level of:
Network: The networks of today are dynamic in nature. They are ever evolving and ever growing. This growth translates into the network being loaded and may cause problems. For example, hubs are continuously added or removed, changes in routing information. It is necessary that all the changes that are made to the network must be documented to facilitate troubleshooting.
Server: The network administrator's chief area of work revolves around the server. At times even a regular task done on the server may start an array of problems. The common tasks that could result in server related problems are:
- After changing user accounts some of the users are unable to access the network or a database;
- Changes in permissions regarding access to data have been made, as a result users are not being able to access files of a specific format; o A new application on the server is causing problems; o Hardware on the server may need to be changed, configured or reconfigured.
Workstation: It is not always the administrator who introduces changes to the systems on the network. Many a times the end users initiate these changes. The end users are not aware of the problems that these changes can cause. The following should be considered while working at the workstation level:
- In case of inability of the system to access the network, look at the network settings for changes;
- Printing problems limited to a single system can be because configuration has been changed;
- Free software and their trial versions are often the cause of problems
Step 4: Establish the Most Probable Cause
There could be multiple reasons for a single problem. If proper information is gathered in the right manner, many of the probable causes can be eliminated. It is best to start from the easiest solution and move up. The easiest solution is the most likely solution to the problem. It is also obvious that the first guess may not be the right one. It might take a few trials before the real cause can be ascertained.
Step 5: Determine if Escalation Is Necessary
The administrator's are not expected to know the solution to every problem but they are definitely expected to work through every problem even if it means taking additional help. It is for the administrator to determine if escalating the issue is a necessity. The procedure to be followed for the same could be specified in an organization requiring strict rules and guidelines to be followed or it could be an informal arrangement. The best policy can be to start with closest help and move towards outside. Talk with the members of your team and try to gain from their experiences. In case the gravity of the problem is on a higher scale, it is best to intimate the head.
Step 6: Create an Action Plan and Solution- Identifying Potential Effects:
This stage is one step short on implementing the solution. Work out a plan on implementing the solution. It should be a detailed plan discussing when, how and how long to take the server offline, the support services that would be required in that time and their availability checks. Pay attention to the minutest detail to ensure that the system is up and running in no time with least amount of damage.
Step 7: Implement and Test the Solution
With the action plan in place perform a dry run of the solution. It is always wise to implement the solution in bits and pieces rather than the whole at one time. This provides the flexibility and ease to retrace the steps back in case of failure.
A good network administrator does not assume the network to be in a working condition after the solution has been implemented. Test the system again and again till complete satisfaction is not felt. Testing the solution is not as easy as it sounds and may also require involvement of other systems, departments and users. A true test of the system is when all the users are logged on and being able to perform their tasks without any hindrances or problems.
Step 8: Identify the Results and Effects of the Solution
The chances of a solution fixing one particular problem and resulting in another problem are very high. It is not possible for the administrator to anticipate the problem that the solution could bring along with it. Actions like addition of clients, replacing of hubs can all have unpredictable results. It is best to assume that the network is going to experience changes when the problem will be fixed and work to keep these changes to the minimum if they are negative in nature.
Step 9: Document the Solution and the Entire Process
The last step is the most neglected of the entire process. It involves recording the problem and the solution that was undertaken. The information recorded should be about:
- The date when the solution was actually implemented as this helps to keep a track of the changes and the problems faced since the last major changes;
- The why behind all the steps that have been taken, that is, what is the problem that has to be taken care of;
- The steps taken to rectify the problem;
- The results of every step that was taken;
- The professional or team members who dealt with the problem.
It is best to record all the information on as and when basis or at the earliest available opportunity. It is recommended that even failure attempts to troubleshoot also be recorded as it serves as information on what not to do with a certain problem.