Table of contents
Usually, after the Incidents/SR module is operational, the following module to be implemented is Problems. This article addresses the application of the Problem Management process in Octopus. To learn more about the basic ITIL® concepts for this process, see the Problem Management - ITIL® Process article.
There is a fundamental difference between an incident and a problem. Simply put, an incident is an unplanned situation and must be resolved as soon as possible since it directly impacts the users capacity to work. Problems, meanwhile, have a different view of an incident by understanding its root cause, which could also be the cause of other incidents. While assignees involved in Incident Management are busy resolving incidents, those involved in Problem Management primarily look for ways to prevent the occurrence of incidents or reduce their impacts so that users are not affected by interruptions. To help you put these concepts into context, we will use a scenario.
«Recently, analysts from the Service Desk have been reporting a higher number of calls for workstations not responding (frozen). One particular user mentions that she did not have this problem in the past, but now the situation happens regularly. Service Desk analysts make the usual troublehooting for this kind of incident and until now, a reboot restores the situation. This is done for all users reporting similar symptoms. Each time, an incident is created and resolved by rebooting the computer.»
Understandably, the reduction or elimination of these repeated incidents would benefit, both the IT group and the users experiencing these interruptions.
Here are three important rules:
- A «Resolved» incident should not be kept open to be analysed later, we proceed with its resolution anyway. However, it is appropriate to create a problem request and link all related incidents to it.
- An incident is not closed, without being resolved, because a problem exists for it. The incident must still be resolved. If there is no solution available, the incident must be escalated to a specialised or higher level group, sometimes help can be sought with a supplier if the expertise is not found internally. But you need to keep looking for a solution to eliminate the incident, even if it is a temporary one. The Service Desk will keep creating and resolving incidents that arise, as long as a permanent solution is not deployed (usually with a change).
- When a change is required to resolve an incident, a problem is not necessarily created. If the incident analysis was sufficient to find the cause and the solution of the situation, a change can be applied directly. . Do not burden your service with a problem that would not have any added value. A relationship between the incident and the change will explain the situation.
While it is more common that a problem is created after the occurrence of several incidents of the same nature, ITIL® mentions that one incident is enough to justify the creation of a problem record, as long as you suspect the existence of an underlying cause. The criteria observed in potential problem identification are multiple and could be: same symptoms, recurrence at a time of the day (week, month), new incidents following a change or a deployment, etc. Here is an illustration that shows the relationship between incident, problem and change.
Problem Creation in Octopus
Create a problem manually
- From the Problems module, click the Create a problem action
- Document the problem with pertinent information, identify CI(s) and add related incidents. Note that Group and Subject fields are mandatory for creation
- Click OK to save the problem. A unique reference number is automatically allocated by the system
Description of the fields
Octopus manages problem status and problem lifecycle status. Problem status are open and closed; each includes a set of problem lifecycle status:
- Open: includes from New to Change in process
- Close: includes Closed and Inactive
Those problem status are accessible from the Home module, by an advanced search on status field. For example, by selecting Problem as request type and status Open, you could get a list similar to :
Problem lifecycle status are specific to the problem record, during its lifecycle. So make sure that advanced search is done at the right place:
See below a description of problem lifecycle status.
1. New (to validate)
|Upon detection and recording. This is the initial information gathering phase|
|2. Planning in progress||Problem categorization and prioritization phase|
|3. Search for root cause in progress||Problem analysis and diagnostic phase - focused on finding the underlying cause. Activities related to this phase can be combined in a task. Several experts can be sought. Identifying the cause is an important element in understanding the source of the problem and help find a solution|
|4. Search for workaround in progress||
Problem analysis and diagnostic phase - focused on finding a workaround, required in resolving incidents. Because the Incident Management objective is to restore the situation as soon as possible, looking for a workaround is a priority
|5. Search for a final solution in progress||Problem analysis and diagnostic phase - focused on finding a permanent solution, which aims to eliminate recurring incidents. This step often involves the implementation of a change|
|6. Change proposed (awaiting authorization)||Having achieved the identification of the cause and permanent solution, a change is proposed to Change Management to correct the problem|
|7. Change in process||
The problem is waiting on Change Management
|8. Closed (resolved problem)||
Once the change is deployed, problem documentation will be updated, incidents still active will be resolved and if necessary, a review will be be done
A choice must be made for inactive problems from the following list:
Priority / Impact / Urgency
Problems should be prioritized the same way as incidents, by using the same reasons. However, other factors should be considered:
- Frequency and impact of related incidents
- Availability of a temporary solution (workaround)
- Criticality of the problem from a service, customer and infrastructure perspective:
- Can the system be recovered, or does it need to be replaced?
- How much will it cost?
- How many people, with which skills, will be needed to fix the problem?
- How long will it take to fix the problem?
- How extensive is the problem (how many CIs are affected)?
- What are the business impacts?
From Octopus, click the blue arrow to select the Impact and Urgency. The Priority is accessible directly.
Impact, urgency and priority levels are detailed in the table below. Note that they are not configurable in the Reference Data Management.
|Impact Levels||Urgency Levels||Priority Levels|
4. Non-critical CI Repair
Categorization / Assignment / Subject
Categorization: the problem categorization is similar to the incident categorization.
Assignment: A problem is usually assigned to a group with the expertise to find the root cause, the solution to be applied, who is familiar with the activities related to the Problem Management process, and understands the necessary interactions with the Incident, Change and Asset & Configuration Management (CMDB) processes.
Source*: List of items identifying the problem source. We proposed the following sources: change result, event logs analysis, incident history, observation, operation activities analysis.
Contact*: To document the contact information.
Detection: By default, detection is set to Reactive, which corresponds to a problem whose analysis is based on existing data of incidents, events or impacted CI. Select Proactive for a problem created for infrastructure known weaknesses or Information / Warning event types that may generate incidents.
Subject: Type in a subject that is significant and clearly describes the problem.
* Source and contact fields can be available upon request to the Octopus Service Desk.
Several fields are available to document the problem throughout its lifecycle. The documentation identifies the information already held on the incidents and the problem, namely the symptoms, occurences in time, etc., but also the actions and results that occur as you work on the problem. These areas are particularly important:
- Description: describes more precisely the problem, its symptoms. A problem created from an incident copies the symptoms documented in the incident description into the problem description field
- Impacts: the impacts that are caused by this problem
- Root Cause: identifies the source that causes or could cause incidents
- Reproduction Scenario: step by step scenario that reproduces the behavior, the error
- Workaround: documents a temporary solution that allows Incident Management to resolve an incident, while Problem Management continues to seek a permanent solution. The problem remains open, until a permanent solution is implemented and the problem solved. In the meantime, you continue to create incidents, apply the workaround and link them to the problem
- Permanent Solution: permanent solution, implemented through a change, which is to ensure that no more incidents occur. The problem is ready to be closed
As for service requests and changes, it is possible to add approval, standard and notification tasks to a problem.
For more details concerning tasks, consult the Task Management article.
Represents the problem activity log, where all activities are logged in chronological order (date and time) throughout the problem management steps. Activities related to Problems module can represent, among other things:
- diagnostic activities of the problem
- the resolution activity, which is easy to identify if we use an activity type
- communications, by sending activities by email to other Octopus users, suppliers, end-users
We recommend that you enter activitiy efforts because they represent the time worked by an Octopus user for the problem. This time represents costs and is added to the total cost of the problem. With this information, you can estimate if the the time already invested or additionnal estimated time is worth continuing efforts; entering this data contributes to decision making.
It is possible to link all types of request (incident, problem, event, change) to a problem, and to qualify the relationship you want to establish. For example, if you want to link a change to a problem, the possible relationshipts can be:
- Is the cause of
- Is the solution of
- Is related to
Other logical relationships are presented according the request type selected.
Enter the CI that at the source of the problem. CI in cause identified in an incident is not necessarily the one that causes the problem.
Attached files Tab
As in all other Octopus modules, you can attach files, add shortcuts, or add an attached file from the content of the clipboard. The Show activity attached files checkbox consolidates all attachments, including the activity ones.
Octopus keeps track of date and time when a problem is created and updated in the History tab. The information presented indicates the data added or updated (the initial value and the new value), the modification date / time and the Octopus user who made it.
When closing a problem, you can specify a closure categorization by selecting an activity type.
From the Activities tab:
- Click on Add
- Select a type that corresponds to the problem closure
- Application update
- CI decomissioned
- External supplier intervention
- Hardware repair
- Internal technical intervention
- Document the closure activity in the Work breakdown section
- Save with OK
- Change the problem status to Closed
To find out how to configure the activity types, see the Activities in Octopus Wiki article.
Create a problem record from an incident
- Open an existing incident, from which you want to create a problem
- Click on Create a problem from this incident action - category, assignment, subject, description and CI in cause are automatically copied into the problem record
- Fill in the problem creation form according to known information, link other incidents if applicable
- Click OK to save the problem record
- The incident that was the source of the problem is automatically linked to the problem with the relation Is an occurence of
Create a problem record from a problem
The action Create a problem from... in Problems Module needs a plugin that can be added to your Octopus database. To get it, you only need to make a request to the Octopus Service Desk.
By clicking this action, the system opens a problem creation form and transfers the following informations:
- Impact, urgency, priority
- Category / subcategory
- Content of Description field in Documentation Tab.
You can create a new problem from an existing problem, but you could also create problem templates by writing Model in the subject. By excluding those models from your current problems lists, you could create a new list that only contains problem models. Those could represent, for example:
- Major Problem - high priority, assignment to the System Administrator group, with a procedure indicated in description
- Network Problem - assignment to the Network Administrator group, with a procedure indicated in description
Create a change record from a problem
The action Create a change from this problem in the Problems Module needs a plugin that can be added to your Octopus Database. To get it, you only need to make a request to the Octopus Service Desk.
By clicking this action, the system opens a change creation form and once the change is recorded, the problem request is automatically linked to the change record in Requests Tab.
Create a Known Error
In Octopus, there is a built in list that displays the Known Errors. This list is based on specific criteria.
Here are the steps to create a known error:
- The status of the problem is not Closed
- The root cause has been identified
- A workaround has been identified
Once these criteria are met, the problem will appear in the Known Error list and the Octopus users will apply the workaround when appropriate.
Resolve the linked occurrences
The Resolve the linked occurrences action resolves or closes any incidents related to a problem that use the link Has for occurrence.
When the action is selected, a window appears on the screen to write the resolve/close activity.
The resolve/close activity will be added to each of the linked requests.
If a workaround is documented in the problem, the This occurrence has been resolved with problem #xxx's workaround solution message is displayed in the resolve/closure activity.
Contribution to Incident Management
Octopus users involved in Incident management can contribute in many ways to Problem Management. The first and most important contribution is undoubtedly the rigorous processing of each incident. This includes:
- Proper documentation, of the subject, description, troubleshooting steps, actions taken and resolution applied, for each incident
- The confirmation of the categorization and the CI in cause when resolving the incident
- The use, when appropriate, of the Potential Problem checkbox for an incident that could be a potential problem
Incident marked as Potential Problem
Any Octopus user can report an incident as a potential problem by selecting the Potential Problem checkbox and justifying the reason for this action. There is no Octopus permission for this, we recommand to establish an internal procedure to ensure the effectiveness of incident analysis. It is of course necessary to document property in the activies, the symptoms, troubleshooting steps and functional escalations to other support levels. If a workaround temporarily removes the symptoms, it should be mentioned in the incident resolution activity. Link other incidents of similar nateure, as they can be linked to the problem. All these actions and documentation provide elements for analysis and important clues to identify the root cause and possibly a solution to the problem.
To report a potential problem from an incident:
- Check the the Potential Problem checkbox, located below the Activity Log
- Add a note to justify the reason
- Link any other incidents of the same nature, if applicable
Note that this action does not create a problem request, it merely identifies incident(s) that could be potentially be a problem.
Problem Management Contribution
Documentation of the problem Workaround field provides Incident Management with a list of temporary solutions to resolve incidents. From the Incidents/SR module, the Find a solution action displays a list of problems for which a workaround is documented. By clicking this action, the system searches for a correspondance with the CI, and/or the manufacturer, the model, the categorization and displays a list of potential workarounds.
As we mentioned above, you must decide which problem is worth pursuing until a permanent solution. If you have to modify a CI to resolve a problem, the best practice is to go through a change. Therefore, a problem well documented wtih the impact, urgency, related incidents or events, related CI, efforts and costs will contribute to the evaluation and implementation of the change.
Various analysis techniques are available to help teams identify the root cause of an incident and assess the options in the approach to be taken to reduce the impact or resolve. We invite you to consult the Investigation & Diagnosis activity described in the Problem Management - ITIL® Process, article, you will find a short description of these techniques.
Using Octopus, employees working in Problem Management will be able to gather data from Octopus to observe what happened in the production environment, identify trends and create appropriate problems. There are several possible sources of data in Octopus, including:
- Incidents marked Potential Problem
- Problem > Most problematic CI
- Problem > Most problematic CI models
- Problem > Most problematic CI types
- Number of events (alerts) that generated incidents
- Activity types, configurated to identify the cause of an incident at resolution
- Existing problems with their related incidents
According to observations and findings, the Octopus user will decide whether to create a problem. But Octopus data will not be the only source in the decision to create a problem. A known unstable CI, a critical CI with no redundancy, or CIs related in «Information» or «Warning» events, a trend that could be corrected would be a reason to create a problem proactively in order to track CIs that may generate incidents in short, medium or long term. Other examples could apply. See the Event Management - Octopus Module or Event Management - ITIL® Process articles to learn more about events.
Incidents marked as not significant for problem management
During the analysis phase, the Octopus user working in Problem Management identifies incidents that are not significant to problem management in order to remove then from future searchs.
- Select one or more existing incidents
- Click the Mark as not significant for problem management action
- A message informs you that selected incident(s) will be marked as not significant for problem management - this operation is irreversible
- The potential problem checkbox is replaced by a note indicating that the incident was marked as non significant for problem management
From the advanced search of the Incidents/SR Module you can access the Problem tab. From there, you can filter incidents with the following options:
- Incidents not associated to a problem
- Exclude incidents not significant for problem management
- Incidents marked as potential problems
Therefore, the Octopus user can do the incident search according to his needs. The search results will include all incidents, from the «new» to the «closed» status.
The advanced search allows to retrieve incidents related to problems and vice-versa. Here are the steps to use:
- Open advanced search
- Select Request relationship from the list of result types
- Enter Type (Request 1) field equal Incident
- Enter Type (Request 2) field equal Problem
You will get a list of incidents related to a problem, which you can save for future reference..
- Access the problem management module
- Create a problem
- Manage the relationship between incidents and problems
- Modify a problem's status
- Modify a problem
- Delete a problem task
To see the list of permissions and a brief description, you can download the following document: Octopus Permission Reference.
Reports and Lists
In the Statistics Module, several reports related to Problem Management are available. Some are management reports, others are used to identify CIs, CI models or CI types that were implicated in incidents or problems, which are useful for Problem Management.
Lists also allow extraction of useful information for Problem Management, for example:
- Known Errors: display problems with documented Root cause and Workaround. They help Incident Management resolve incidents faster by making solutions available .
- CI associated to problems: list of CIs related to problems
- Open: list of active problems; by displaying priority and number of incidents columns, this list provide information about a potential priority change (an increase in related incidents could justify it)
- Closed: list of resolved problems
- Root Cause To Be Identified : problems open for more than one month for which a root cause is not documented; this list is useful to track problems that have not been worked on for some time and is based on exceeding a threshold
- Closed Incident Categories : this list serves Incident Management and Problem Management in the analysis of the categorization selected in incidents and problems, in order to adjust (by modifying or adding categories) the categorization structure. This list can be exported in Excel, like all lists created in Octopus.
To support the teams participating in Problem Management, knowledge should be documented to help with the proper operation of the process, including:
- investigations, diagnoses, root cause analysis techniques
- creating / updating workarounds, temporary fixes and resolution
This knowledge can be documented in Configurations module in a document type CI. Thus, teams can refer to and even use analysis documents in a problem. See How to Manage Procedures in Octopus that will guide you in the establishing formalized knowledge for Problem Management.
An organization who has in-house application development, could want to record known errors during the development phase (problem with a root cause identified and a documented workaround). If the known error is resolved before the application go-live, the record can be resolved. Otherwise, we could transfer the known error into production, as it could be used for incident resolutions, by selecting a problem lifecycle status. To distinguish the known errors in development from the known errors in production, separate lists can be created for consultation.
If you want to apply this concept to your development department, you must make a request to Octopus Service Desk who will add the In Development status to the Problems Module of your database.
To know more about Major Problems
A major problem is a Problem where the severity or impact is such that management decides to review how the situation was handled. A major problem review includes processes followed, actions of staff, tools used and the environment. This review is a learning activity, it is not punitive or a criticism It aims:
- not judging success or failure
- to attemp to discover why things happened
- to focus directly on tasks and goal that were to be accomplished
- to encourage employees to surface important lessons learned
- to share lessons learned with others
Ths last version of ITIL® introduced a major problem review activity to prevent reoccurrence, to verify that the problems marked as closed have effectively eliminated the error, and to retain lessons for the future.
In the Octopus context, the major problem review can be assured by a standard task added to the problem workflow. It should include the following elements:
- what was done right
- what was done wrong
- what could be done better next time
- how to prevent the problem from happening again
- identification of lessons learned
Do not hesitate to consult the Task Management article to get more details on task configuration in Octopus.
Thank you, your message has been sent.