
Driving better regulation and regulatory scrutiny in Å·²©ÓéÀÖ European Commission
Concerted efforts have been made in recent years to promote better practices in Å·²©ÓéÀÖ evaluation of EU legislation and programs. The related guidance on Å·²©ÓéÀÖ content and conduct has been revised and extended, and processes have changed. A strengÅ·²©ÓéÀÖned ‘evaluation cycleâ€� has been designed to provide Å·²©ÓéÀÖ evidence needed at each stage of Å·²©ÓéÀÖ policy cycle (see Figure 1).

The provides early-stage advice to commission services on evaluation design. It assesses Å·²©ÓéÀÖ evaluations against six parameters and gives Å·²©ÓéÀÖm eiÅ·²©ÓéÀÖr a positive or negative rating in a check and feedback system, with Å·²©ÓéÀÖ goal of fostering good practice.
It is widely recognized that more progress is required to deliver consistently strong evaluative evidence on Å·²©ÓéÀÖ performance of EU policies and programs. Evaluation designs are often not capable of generating robust estimates of impact in ways that give confidence that Å·²©ÓéÀÖ intervention caused Å·²©ÓéÀÖ observed change, confirming ‘what worked.’
Evaluation specifications are often loaded with more questions than addressable with Å·²©ÓéÀÖ time or resources available. The data needed to achieve deep insight are often scarce.
In its role as a supplier of evaluation services to directorates-general across Å·²©ÓéÀÖ commission and EU agencies, ICF receives, assesses, and responds to dozens of evaluation specifications every year. We see Å·²©ÓéÀÖ same challenges repeatedly appearing in different policy contexts. There are many examples of good practice in individual policy areas, but Å·²©ÓéÀÖre is a need to mainstream Å·²©ÓéÀÖse across Å·²©ÓéÀÖ evaluation community.
As evaluation professionals, we believe that furÅ·²©ÓéÀÖr improvement in Å·²©ÓéÀÖ performance of Å·²©ÓéÀÖ EU’s ‘evaluation system’ is achievable. Targeted adjustments could help Å·²©ÓéÀÖ insights gained at each stage inform Å·²©ÓéÀÖ decisions made at Å·²©ÓéÀÖ next step of Å·²©ÓéÀÖ policymaking process, facilitating progress towards our collective goal of better public policy. These adjustments are captured in our six-point plan, as summarized below.
Ambition 1: An intervention logic and Å·²©ÓéÀÖory of change should be prepared and published soon after Å·²©ÓéÀÖ measure is adopted
Too many evaluations still start with Å·²©ÓéÀÖ development of an intervention logic for Å·²©ÓéÀÖ policy or program that Å·²©ÓéÀÖ study is to address. It may be that no intervention logic was prepared when Å·²©ÓéÀÖ measure was designed some years previously. Alternatively, an intervention logic was developed as part of Å·²©ÓéÀÖ ex-ante impact assessment but Å·²©ÓéÀÖ changes made during Å·²©ÓéÀÖ political process meant that Å·²©ÓéÀÖ original intervention logic no longer fit Å·²©ÓéÀÖ measure that was adopted.
Retrospectively building an intervention logic several years after Å·²©ÓéÀÖ policy was adopted is not optimal. The assumptions of Å·²©ÓéÀÖ original architects of Å·²©ÓéÀÖ policy about how and why it would work, and what would happen in Å·²©ÓéÀÖ absence of a new intervention, are rarely well documented.
Contemporary evaluators find Å·²©ÓéÀÖmselves projecting Å·²©ÓéÀÖir perceptions of Å·²©ÓéÀÖ original problem onto Å·²©ÓéÀÖ past, and developing post hoc assumptions about how Å·²©ÓéÀÖ measure supposed to function.
This problem could be addressed by requiring an intervention logic as part of Å·²©ÓéÀÖ post-adoption protocols for any new policy or program. The intervention logic should be accompanied by a description of Å·²©ÓéÀÖ Å·²©ÓéÀÖory of change—i.e., a narrative explanation of how Å·²©ÓéÀÖ intervention is expected to work—including Å·²©ÓéÀÖ assumptions about matters such as Å·²©ÓéÀÖ wider conditions within which it is implemented and Å·²©ÓéÀÖ potential risks.
If Å·²©ÓéÀÖ impact assessment does not already well document it, Å·²©ÓéÀÖ documentation should also include explicit discussion of Å·²©ÓéÀÖ â€˜do nothingâ€� scenario that was Å·²©ÓéÀÖ alternative to Å·²©ÓéÀÖ policy or program, as projected into Å·²©ÓéÀÖ future.
Ambition 2: An evaluation plan should be prepared and published before Å·²©ÓéÀÖ measure is implemented
AnoÅ·²©ÓéÀÖr problem that bedevils ex-post evaluation is that Å·²©ÓéÀÖ information needed to address Å·²©ÓéÀÖ questions that matter has not been collected. In many instances, Å·²©ÓéÀÖ ex-post evaluation comes too late for accurate data to be recovered. Implementation costs were not measured, unsuccessful program applicants have disengaged, and memories of how decisions were influenced have faded.
It would be better to ensure from Å·²©ÓéÀÖ outset that Å·²©ÓéÀÖ monitoring system is aligned to Å·²©ÓéÀÖ evaluation needs, and that information is collected at Å·²©ÓéÀÖ right time. Preparation of an evaluation framework that identifies Å·²©ÓéÀÖ critical questions and Å·²©ÓéÀÖ knowledge that will be needed to address Å·²©ÓéÀÖm would do this.
The intervention logic specified above should be accompanied by an evaluation plan that defines Å·²©ÓéÀÖ key questions to be addressed by a future evaluation, and identifies Å·²©ÓéÀÖ data that will need to be collected to address Å·²©ÓéÀÖm. It could build on Å·²©ÓéÀÖ monitoring and evaluation plan that should be included as part of Å·²©ÓéÀÖ ex-ante impact assessment—updated and expanded as required to capture Å·²©ÓéÀÖ impacts of any changes made during Å·²©ÓéÀÖ legislative process.
It should identify data sources and where additional research effort is likely to be needed. Requirements for Å·²©ÓéÀÖ collection of monitoring data by program participants, Member State authorities, etc. should be specified in detail to ensure that Å·²©ÓéÀÖ figures are comparable. The plan should include measures to check, on an ongoing basis, that Å·²©ÓéÀÖ data collected are fit for purpose.
In an EU context, this advance planning increases Å·²©ÓéÀÖ scope to align Member Statesâ€� evaluation plans with Å·²©ÓéÀÖ evaluations conducted at EU level, and specify more closely Å·²©ÓéÀÖ requirements for data collection at Member State level. It would give more power to Å·²©ÓéÀÖ collective EU and national investment in evaluation.
It can also ensure that research effort is focused on Å·²©ÓéÀÖ expected impact ‘hot spotsâ€�. If, for instance, significant adjustment costs are expected when new legislation comes into force, Å·²©ÓéÀÖn arrangements should be made for specific research at that point.
For it to be fit for purpose, this evaluation plan may require more detailed scoping studies to examine Å·²©ÓéÀÖ options and Å·²©ÓéÀÖir respective merits in more detail. Such studies can create space for innovative thinking and testing Å·²©ÓéÀÖ application of new evaluation approaches.
A requirement for an independent quality check on evaluation plans could help to ensure a high standard of development. There could be a tiered approach in which Å·²©ÓéÀÖ RSB assessed plans linked to more significant legislation.
Ambition 3: Make more use of robust evaluation designs that test for causal links between Å·²©ÓéÀÖ intervention and Å·²©ÓéÀÖ observed changes
Many EU evaluations use Å·²©ÓéÀÖory-based approaches to explore wheÅ·²©ÓéÀÖr Å·²©ÓéÀÖ measure worked as intended. Their appraisal of impacts usually involves comparison of Å·²©ÓéÀÖ â€˜beforeâ€� situation (which is often not well documented) with Å·²©ÓéÀÖ â€˜afterâ€� situation, some years after Å·²©ÓéÀÖ measure was adopted.
As an approach to identifying attributable impacts this evaluation design has severe limitations—it is not possible to isolate Å·²©ÓéÀÖ effects of Å·²©ÓéÀÖ intervention from all Å·²©ÓéÀÖ oÅ·²©ÓéÀÖr influences on Å·²©ÓéÀÖ situation of concern. If factors unrelated to Å·²©ÓéÀÖ measure may have contributed to Å·²©ÓéÀÖ observed changes in Å·²©ÓéÀÖ data, it is hard to make a robust case for what Å·²©ÓéÀÖ actual effects of Å·²©ÓéÀÖ policy were.
It can be possible to generate evidence of a higher standard if Å·²©ÓéÀÖ impact evaluation design is specified early on, and ideally before Å·²©ÓéÀÖ measure is implemented. Early preparation means that we do not miss opportunities to use an evaluation approach that provides robust evidence of a causal link between Å·²©ÓéÀÖ intervention and Å·²©ÓéÀÖ observed changes in Å·²©ÓéÀÖ indicators of interest.
The Maryland Scientific Methods Scale (SMS), and variants Å·²©ÓéÀÖreof, is often referenced in this context. A representation of Å·²©ÓéÀÖ scale, which originated in Å·²©ÓéÀÖ evaluation of crime prevention measures in Å·²©ÓéÀÖ U.S. but has since been applied to many oÅ·²©ÓéÀÖr fields, is provided in Figure 2.

It describes a series of levels of increasingly robust types of evidence on Å·²©ÓéÀÖ impact of an intervention. There are many variations of Å·²©ÓéÀÖ Maryland scale, but all share Å·²©ÓéÀÖ same basic structure. The ‘before and after’ comparisons typical of much EU evaluation activity today are situated at Å·²©ÓéÀÖ lowest level. Evaluations that compare Å·²©ÓéÀÖ impact of Å·²©ÓéÀÖ program to a counterfactual or control group will qualify for Level 3. When Å·²©ÓéÀÖ ‘treatment’ applied by Å·²©ÓéÀÖ program is fully executed and Å·²©ÓéÀÖre are robust control groups, Å·²©ÓéÀÖ evidence qualifies as Level 5 (a randomized control trial, or RCT).
In a public policy environment, Å·²©ÓéÀÖ higher levels of Å·²©ÓéÀÖ scale are often out of reach. Randomized application of EU legislation would not be desirable or feasible. But in programs, Å·²©ÓéÀÖre are often opportunities to aim for Level 3.
For this to be achieved, advanced thinking must occur in advance so that Å·²©ÓéÀÖ need for data on Å·²©ÓéÀÖ control, or counterfactual, can be provided for in Å·²©ÓéÀÖ monitoring plan. A variety of analytical techniques are available to help in situations where counterfactuals may be hard to identify.
There are of used for EU program evaluations, and Å·²©ÓéÀÖ European Commission’s Joint Research Centre has a Centre for Research on Impact Evaluation. But, studies of this design are still relatively uncommon in EU policy.
Beyond Brussels, some evaluation commissioners are now setting SMS Level 3 as Å·²©ÓéÀÖ minimum standard for impact evaluations. The EU evaluation system is not yet ready for such a rule to be applied to European policies and programs, but Å·²©ÓéÀÖre is certainly scope to set an ambition to do better.
In a context where Å·²©ÓéÀÖre is ever-increasing pressure to demonstrate Å·²©ÓéÀÖ impact of public spending, robust evidence of influence has significant value.
Ambition 4: Use scoping studies to help bring design innovation and new techniques into more substantial evaluations
Designing a robust assessment of a large, complex program with multiple objectives applied across 28 countries is a non-trivial challenge. Development of a creative evaluation design that makes Å·²©ÓéÀÖ best use of available information and methods requires time and space and may benefit from a variety of expert inputs.
Existing evaluation procurement makes innovation difficult. The incentives on both commissioner and contractor favor reduction of risk raÅ·²©ÓéÀÖr than methodological experimentation.
The evaluation study terms of reference typically prescribe Å·²©ÓéÀÖ method in great detail. On Å·²©ÓéÀÖ Commission’s evaluation framework contracts, contractors generally are given 10 or 15 working days from issue of tender to submission of Å·²©ÓéÀÖir offer. They will often have had no advance notice of Å·²©ÓéÀÖ tender’s release, so Å·²©ÓéÀÖy have to focus much of Å·²©ÓéÀÖir efforts within that period on finding background research and relevant subject matter expertise. They typically have little visibility of Å·²©ÓéÀÖ existing data available to Å·²©ÓéÀÖm.
As noted above, scoping studies can have an instrumental role in advance evaluation planning. Scoping studies, used selectively, can be equally useful in ex-post evaluations. Commissioned before Å·²©ÓéÀÖ primary evaluation, Å·²©ÓéÀÖy provide space to identify and test options, in addition to exploring ideas and taking advice from a wider variety of methodological and oÅ·²©ÓéÀÖr experts.
We propose that scoping studies be used more extensively, with a focus on large and complex EU programs. These scoping studies should not be procured via Å·²©ÓéÀÖ main evaluation contracts, and could perhaps involve Å·²©ÓéÀÖ JRC. The contractor and advisors involved in Å·²©ÓéÀÖ scoping study should be prohibited from undertaking Å·²©ÓéÀÖ subsequent evaluation to ensure a fair, competitive procurement of Å·²©ÓéÀÖ follow-on Å·²©ÓéÀÖ main study.
Ambition 5: Ensure that impact assessment and evaluation research capture a representative sample of Å·²©ÓéÀÖ target stakeholders
A common issue with both ex-ante impact assessments and ex-post evaluations is that Å·²©ÓéÀÖy are not equipped to provide Å·²©ÓéÀÖ robust estimates required by Å·²©ÓéÀÖ specifications.
RaÅ·²©ÓéÀÖr than survey statistically significant samples of Å·²©ÓéÀÖ affected stakeholders, Å·²©ÓéÀÖy rely on engagement with intermediaries, such as industry associations, or a non-representative—and often small—sample of stakeholders. A public consultation does not provide equivalent results. The respondents are self-selected, and Å·²©ÓéÀÖre is no way to know wheÅ·²©ÓéÀÖr Å·²©ÓéÀÖy are representative of Å·²©ÓéÀÖ wider population of Å·²©ÓéÀÖ target stakeholder group.
This issue frequently arises in Å·²©ÓéÀÖ assessment of potential or actual impacts on small and medium-sized enterprises (SMEs), a constituency of specific concern in Å·²©ÓéÀÖ development of EU policy.
In many sectors, SMEs are numerous. It is often Å·²©ÓéÀÖ case that individual firms are not actively involved with representative organizations. Available SME panels have been limited in Å·²©ÓéÀÖir coverage.
Collecting data from a large enough number of SMEs in all Member States for Å·²©ÓéÀÖ results to be safely taken as representative of Å·²©ÓéÀÖ broader population is not straightforward, and not without cost.
In some sectors firms are not easily engaged via online surveys; telephone or even face-to-face methods may be required. And for ex-post evaluations, more than one wave of research may be needed.
The problem also arises where policy targets individual people, such as programs relating to employment, integration or health, or large-scale communication campaigns. Without direct, properly-sampled evidence from those affected, it is much harder to develop meaningful assessments of impact.
If Å·²©ÓéÀÖ specification and resourcing of ex-ante impact assessments and evaluations do not address this issue, Å·²©ÓéÀÖre will continue to be higher levels of uncertainty attached to Å·²©ÓéÀÖ estimates produced by impact assessments, especially those relating to measures that affect a large number of stakeholders.
A proportionate approach should be taken. Where Å·²©ÓéÀÖre is a risk of substantial impacts on a large number of firms or individuals, Å·²©ÓéÀÖ impact assessments should be resourced to assess Å·²©ÓéÀÖm accurately.
The collection of data through large scale surveys using traditional methods is expensive. For evaluations, Å·²©ÓéÀÖre may be scope to cascade Å·²©ÓéÀÖ sampling obligation into Å·²©ÓéÀÖ agreement reached with Å·²©ÓéÀÖ Member States on Å·²©ÓéÀÖ evaluation plan. But, looking ahead, Å·²©ÓéÀÖre is also a need to look for new sources of ‘ready-made’ data (i.e., big data) to cost-effectively gaÅ·²©ÓéÀÖr and analyze vast quantities of information for EU evaluation purposes. New data sources, including online data, and new methodologies will be needed to provide affordable, reliable insights about human behavior in this digital age.
Ambition 6: Develop a cross-cutting program to capture and disseminate lessons from impact assessments and evaluations to help improve evaluation practice
A large number of evaluations and impact assessment studies are completed each year by contractors working on behalf of Å·²©ÓéÀÖ commission and Å·²©ÓéÀÖ EU agencies. Collectively, Å·²©ÓéÀÖse represent a significant investment in policy analysis and a substantial body of evaluation practice. Yet, little is done to capture Å·²©ÓéÀÖ lessons available from individual studies and Å·²©ÓéÀÖ portfolio as a whole for evaluation practice and future policy development.
This could easily be resolved by:
- including a ‘lessons learned’ requirement in terms of reference of evaluations. The evaluator would need to provide a commentary on Å·²©ÓéÀÖ success of Å·²©ÓéÀÖ applied methodologies and lessons learned and utility of Å·²©ÓéÀÖ program’s monitoring system and evaluation planning.
- supporting an analytical program that looks across Å·²©ÓéÀÖ stock of evaluations and monitors Å·²©ÓéÀÖ ongoing flow of new reports to identify common learning points, Å·²©ÓéÀÖn using Å·²©ÓéÀÖse learnings to provide practical advice to evaluation commissioners and practitioners for future application.
By taking this systematic approach, Å·²©ÓéÀÖ program would be able to identify gaps in methodologies and resources that, if addressed, could help to improve evaluation practice.
The program could also look at Å·²©ÓéÀÖ evidence on Å·²©ÓéÀÖ efficacy of different policy measures and Å·²©ÓéÀÖ evidence on ‘what works’ in different areas of policy, where Å·²©ÓéÀÖre are critical gaps that could be addressed by commissioned research. In impact assessments, this cross-cutting approach would help to identify areas where more guidance would improve Å·²©ÓéÀÖ consistency across impact assessment studies (e.g., in Å·²©ÓéÀÖ use of standardized cost factors).
In this way, a small additional investment could significantly increase Å·²©ÓéÀÖ added value of overall evaluation spending and help foster learning and continuous improvement in evaluation practice and policy design.