What do the results mean?
How do I read reports?
Main reports
The year-on-year report views show results in a coloured grid.
Each cell contains the mean score and a colour which indicates if the score is an outlier.
- Red: a red outlier is a score in the bottom quartile (Q1) of the benchmark group, and the confidence interval does not overlap with that of the benchmark mean.
- Pink: a score in the bottom quartile, but the confidence interval overlaps with that of the benchmark mean.
- White: a score in between the top and bottom quartiles of the benchmark group.
- Light green: a score in the top quartile (Q4), but the confidence interval overlaps with that of the benchmark mean.
- Green: a green outlier is a score in the top quartile of the benchmark group, and the confidence interval does not overlap with that of the benchmark mean.
- Grey: fewer than three results (n<3). We only report results which have three or more responses.
Question item reports
Question item reports display as a vertical bar chart or table.
In the chart, the question text is shown at the top of the page and n or n range (the number of doctors who answered the question) are shown below the chart.
You can hover over the bars to see percentages for each answer.
In the chart view you can select other questions that make up the indicator.
What are indicators and how are indicator scores calculated?
What are indicators?
In the survey doctors answer questions based on their experience of training. Questions are grouped by theme and we refer to these groupings as indicators. We use indicators to measure how doctors feel about specific areas of training.
How are indicator scores calculated?
We use each doctor's score to build reports. For example, to show results by site we average the scores of doctors at that site or to show results by specialty we average the scores of doctors in that specialty.
Survey respondents must answer enough of the questions that make up an indicator for their scores to count towards the overall results in an indicator report. This rule is usually x-1; so if an indicator is made up of five questions, doctors must have answered at least four to be included in the results.
In most cases, if one of the question responses is 'N/A', we do not score it and so it does not count towards an individual’s score for that indicator. There are also some indicators that have specific exclusion rules for certain specialties or training levels. For example, for Handover, the following rules apply:
- 'Exclude the following post specialties (any training level) from indicator calculation: Allergy, Audio Vestibular Medicine, Clinical genetics, Clinical radiology, Neuropathology, Paediatric Pathology, General Practice, Histopathology, Occupational medicine. OR the following training level(s): F1'
Because of these rules and exclusions, occasionally, the n range for an indicator report is slightly different to the n range for the individual question drill through.
What are benchmarks?
What are benchmark groups?
Results are calculated by comparing a report group to a benchmark group. Doctors' scores contribute to the score for the report groups they are in (specialty, site, training level, etc). The scores are also used to calculate the benchmark score.
The benchmark group is the group of respondents whose scores you are comparing your report group to. For example, if your report group is general surgery your benchmark group would be all surgical specialties combined.
The report group is the group you are interested in looking at. For example, you can look at a post specialty by trust/board report to see how General psychiatry at your trust compares to General psychiatry at other trusts. In this example the benchmark group would be all psychiatry.
In the example below, in the benchmark group of All psychiatry (including General psychiatry) six out of 12 doctors responded positively. This gives a benchmark score of 50. In the report group General psychiatry, two out of three doctors responded positively. This gives a mean score of 66.67 for the report group. Dr A's score contributes to both the report group and the benchmark score.
Because different reports compare different groups, they use different benchmark groups. For example, Dr A is in an F1 General psychiatry post in Big City mental health trust. The scores calculated from Dr A's answers will contribute to, and be compared against, different benchmark groups in the different reports.
How do we use benchmark and reporting groups?
Reports compare the doctor's reporting group (eg level or specialty) against a benchmark group.
Different reports use different benchmark groups. This is to make sure that comparisons of groups are fair.
For example, if you filtered the post specialty by site report for general surgery at a specific hospital, the report would show you how that group compared with all surgery posts at all hospitals.
How to use the benchmark tables to find your benchmark group
- Find the reporting and benchmark groups table for the report you are looking at
- Look for your specialty, training level etc to see the benchmark for your reporting group
- Use the benchmark groups table for more detail on what is included in the benchmark group.
What are outliers and how do we calculate them?
What are outliers?
Outliers are scores that are significantly higher or lower than the average score. In the education data tool they are shown as red or green flags.
The benchmark group is the group of respondents whose scores you are comparing your report group to. For example, if your report group is general surgery your benchmark group would be all surgical specialties combined.
How do we calculate outliers?
To calculate outliers we first calculate the benchmark group scores. We sort all of the scores into the benchmark group in order. We split the scores into quarters.
We also calculate the mean score for the benchmark group (national mean). The national mean is included on the data downloads.
Scores in the bottom (Q1) and top (Q4) quartiles result generate outliers. In most cases these are shown as red (below) and green (above) flags.
In most cases, scores in the bottom (Q1) and top (Q4) quartiles generate outliers. These are shown as red (below) and green (above) flags. However, there are examples of indicators that have scored 100, but are not above outliers. If more than 25% of a benchmark group's scores are perfect (i.e. 100), it's impossible for any score to be considered an above outlier (green). The same applies if over 25% of scores are zero, preventing any score from being a below outlier (red).
If an indicator has a benchmark group with more than 25% of results as a score of 100, no result can be in the top quartile or marked as light green. These results may show within interquartile range. The same principle would apply at the other end of the range with the red and pink outliers. If more than 25% of scores are 0, no result can be a 'below' outlier.
What are confidence intervals?
Confidence intervals are the range of values that, to a certain percentage of confidence (95% in the survey), we are sure the 'true' mean value lies in, accounting for random error. That is, for 95% of confidence intervals, the true mean lies within these range of values.
Hospital A: 100 doctors complete the survey, most of them agree that training is good. We have lots of data and a high level of agreement within the group. This means that we can be confident about the score and the confidence interval is small.
Hospital B: Three doctors complete the survey, each of them gives a different response. We have a small amount data and no agreement within the group. This means that we are less confident about the score and the confidence interval is large.
Quartiles are also used to determine the flags. The first quartile refers to the value at which 25% of indicator scores lie below and 75% of scores lie above. The third quartile refers to the value at which 75% of indicator scores lie below and 25% of scores lie above.
We calculate the confidence intervals for the national mean and the report group. Finally, we compare your score and confidence interval to the benchmark score and confidence intervals. If the score is significantly more negative or positive compared to the national average, the results are highlighted red or green. Where it is negative or positive, but its confidence interval overlaps with the confidence interval of the national mean, the box is highlighted pink or light green.
Below are some examples of how the scoring system works. In the examples, the national mean and the national confidence intervals are represented by a black dot and whiskers.
Figure 1: Hospital A has scored 33 on the indicator. This falls in the first quartile with confidence so will score a red flag.
Figure 2: Hospital B has scored 44 on the indicator, the mean score falls into the first quartile. However, Hospital B’s confidence interval overlaps with the confidence interval for the national mean so will score a pink flag for this indicator.
Figure 3: Hospital E has scored 67 on the indicator, it is neither in the first nor fourth quartile, therefore will score a white flag.
Figure 4: Hospital G has scored 78 on the indicator, this falls in the fourth quartile. Hospital G’s confidence interval overlaps with the confidence interval for the national mean so will score a light green flag.
Figure 5: Hospital H has scored 82 on the indicator. This falls in the fourth quartile with confidence so will score a green flag.
What happens next?
Deaneries/NHS England local offices
The results are used as a screening tool by deaneries/NHS England local offices to help them decide where there might be areas that need looking into.
Usually, survey results are triangulated with other sources of information to help ensure that resources are allocated to the right areas.
Deaneries/NHS England local offices and local education providers will review their regional and local results straight away and start planning their quality assurance and improvement activity for the next few months. Deaneries/NHS England local offices report back to us periodically through their dean's reports.
Royal colleges
Royal colleges use results from the programme specific questionnaires to monitor the delivery of curricula.
General Medical Council
We use survey data to help quality assure medical education and training across the UK.