Grading the quality of evidence and the strength of recommendations
Judgments about evidence and recommendations in healthcare are complex. For example, those making recommendations must decide between recommending selective serotonin reuptake inhibitors (SSRI’s) and tricyclics for the treatment of moderate depression must agree on which outcomes to consider, which evidence to include for each outcome, how to assess the quality of that evidence, and how to determine if SSRI’s do more good than harm compared with tricyclics. Because resources are always limited and money that is allocated to treating depression cannot be spent on other worthwhile interventions, they may also need to decide whether any incremental health benefits are worth the additional costs.
Systematic reviews of the effects of healthcare provide essential, but not sufficient information for making well informed decisions. Reviewers and people who use reviews draw conclusions about the quality of the evidence, either implicitly or explicitly. Such judgments guide subsequent decisions. For example, clinical actions are likely to differ depending on whether one concludes that the evidence that warfarin reduces the risk of stroke in patients with atrial fibrillation is convincing (high quality) or that it is unconvincing (low quality).
Similarly, practice guidelines and people who use them draw conclusions about the strength of recommendations, either implicitly or explicitly. Using the same example, a guideline that recommends that patients with atrial fibrillation should be treated may suggest that all patients definitely should be treated or that patients should probably be treated, implying that treatment may not be warranted in all patients.
A systematic and explicit approach to making judgments such as these can help to prevent errors, facilitate critical appraisal of these judgments, and can help to improve communication of this information. Since the 1970’s a growing number of organizations have employed various systems to grade the quality (level) of evidence and the strength of recommendations. Unfortunately, different organizations use different systems to grade evidence and recommendations. The same evidence and recommendation could be graded as “II-2, B”, “C+, 1”, or “strong evidence, strongly recommended” depending on which system is used. This is confusing and impedes effective communication.
One of the aims of the GRADE Working Group was to reduce unnecessary confusion arising from multiple systems for grading evidence and recommendations. To avoid adding to this confusion by having multiple variations of the GRADE system we suggest that the criteria below should be met when stating that the GRADE approach was used to assess evidence or develop recommendations. Also, while users may believe there may be good reasons for modifying the GRADE system, we discourage the use of "modified GRADE approaches" that differ from the approach described by the GRADE Working Group.
On the other hand, we encourage and welcome constructive criticism of the GRADE approach, suggestions for improvements, and involvement in the GRADE Working Group. As most scientific approaches to advancing healthcare, the GRADE approach will continue to evolve in response to new evidence and to meet the needs of systematic review authors, guideline developers and other users.
Suggested criteria for stating that the GRADE system was used (updated 2016-04; full pdf version with document history and references):
- The certainty in the evidence (also known as quality of evidence or confidence in the estimates) should be defined consistently with the definitions used by the GRADE Working Group.
- Explicit consideration should be given to each of the GRADE domains for assessing the certainty in the evidence (although different terminology may be used).
- The overall certainty in the evidence should be assessed for each important outcome using four or three categories (such as high, moderate, low and/or very low) and definitions for each category that are consistent with the definitions used by the GRADE Working Group.
- Evidence summaries and evidence to decision criteria should be used as the basis for judgements about the certainty in the evidence and the strength of recommendations. Ideally, evidence profiles should be used to assess the certainty in the evidence and these should be based on systematic reviews. At a minimum, the evidence that was assessed and the methods that were used to identify and appraise that evidence should be clearly described.
- Explicit consideration should be given to each of the GRADE criteria for determining the direction and strength of a recommendation or decision. Ideally, GRADE evidence to decision frameworks should be used to document the considered research evidence, additional considerations and judgments transparently.
- The strength of recommendations should be assessed using two categories (for or against an option) and definitions for each category such as strong and weak/conditional that are consistent with the definitions used by the GRADE Working Group (although different terminology may be used), such as strong.