Skip Navigation


Health Education Research Advance Access originally published online on October 12, 2005
Health Education Research 2006 21(2):219-229; doi:10.1093/her/cyh058
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
21/2/219    most recent
cyh058v1
Right arrow E-letters: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when E-letters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Molleman, G. R M
Right arrow Articles by Oosterveld, P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Molleman, G. R M
Right arrow Articles by Oosterveld, P.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oxfordjournals.org

Project quality rating by experts and practitioners: experience with Preffi 2.0 as a quality assessment instrument

Gerard R M Molleman1,*, Louk W H Peters1, Clemens M H Hosman2,3,, Gerjo J Kok4 and Paul Oosterveld5

1 Netherlands Institute for Health Promotion and Disease Prevention, Centre for Knowledge and Quality Management, PO Box 500, NL-3440 AM Woerden, the Netherlands
2 Prevention Research Centre, Department of Health Education and Promotion, University of Maastricht, Maastricht, the Netherlands
3 Department of Clinical Psychology, University of Nijmegen, Nijmegen, the Netherlands
4 Faculty of Psychology, University of Maastricht, Maastricht, the Netherlands
5 Methodology, Amsterdam, the Netherlands

*Correspondence to: G. R. M. Molleman. E-mail: gmolleman{at}nigz.nl


    Abstract
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion and conclusions
 References
 
Preffi 2.0 is an evidence-based Dutch quality assessment instrument for health promotion interventions. It is mainly intended for both planning and assessing one's own projects but can also be used to assess other people's projects (external use). This article reports a study on the reliability of Preffi as an external quality assessment instrument. Preffi is used to assess quality at three levels: (i) specific criteria, (ii) clusters of criteria and (iii) entire projects. The study compared Preffi-based assessments of 20 projects by three practitioners with their intuitive assessments of the same projects and with assessments by three experts, which were to be used as external criteria. The intuitive assessments only related to the cluster and project levels. Our main hypothesis was that intuitive assessments by practitioners would be less reliable and accurate than their Preffi-based assessments and the experts' assessments. On the whole, we failed to confirm this hypothesis: the experts' assessments proved less reliable and accurate than the practitioners' intuitive and Preffi-based assessments and differed too much from each other to be used as external criteria. The Preffi-based assessments by the practitioners had an acceptable generalizability coefficient (G) and accuracy (standard error of measurement). At the level of the entire project, two assessors are needed to produce sufficiently reliable and accurate assessments, whereas three are needed for assessment at cluster level. The study also showed that different assessors use different perspectives and base their assessment on a variety of aspects. This was regarded as inevitable and even useful by the assessors themselves. Discussions between assessors are important to achieve consensus. The article suggests some improvements to Preffi to further increase its reliability.


    Introduction
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion and conclusions
 References
 
The Preffi (Health Promotion Effect Management Instrument) is a Dutch quality assessment instrument for health promotion (HP) interventions, based on research findings about programme and project aspects affecting effectiveness and quality. It allows users, mostly HP practitioners, to assess to what degree a programme incorporates conditions and aspects that are generally acknowledged to contribute to effectiveness and to suggest improvements. Effectiveness is one of the important aspects of quality; other aspects usually distinguished include ethics and client satisfaction. Preffi was developed in the mid-1990s, after detailed and lengthy consultations with researchers and practitioners. In the early 2000s, it was thoroughly revised and updated into a new version, the Preffi 2.0. The revision process was once again undertaken in close collaboration with researchers and practitioners, and was partly based on research into the instrument's usefulness and reliability. Both these development processes have been described in detail elsewhere [1–4]. Preffi is the most elaborate quality assessment instrument in this domain and has received considerable attention both in the Netherlands and internationally [5–9]. In the Netherlands, the Preffi will be used as a key document for the National Quality System for Health Promotion. Next to a Dutch version, the Preffi 2.0 is also translated in English, French and Hungarian and a Norwegian translation is under construction.

Preffi 2.0 consists of 39 quality criteria (for criteria see Table II), subdivided into eight clusters: problem analysis, determinants of behaviour and environment, target group, objectives, intervention development, implementation, evaluation and contextual conditions and feasibility. The names of the first seven clusters show that Preffi 2.0 is based on planning models [10, 11], whereas the final cluster emphasizes the importance of project management.


View this table:
[in this window]
[in a new window]
 
Table II. G and SEM values for each individual item, per cluster, and for the assessments as a whole, plus the numbers of assessors needed to achieve an accurate assessment, per item, with the corresponding G values

 
In the Preffi 2.0 version, the assessment aspect is more fully specified than in the 1.0 version, which did not clearly indicate when a project satisfied a particular quality criterion [12]. To this end, specific operationalizations and norms were developed for each of the criteria in the 2.0 version (see Box 1 and www.preffi.nl). These were expected to provide users with clear instructions and thus make their assessments more in agreement and, therefore, more reliable. This assumption is tested and reported in this paper.


Box 1. Example of an operationalization and norm for a Preffi criterion
5.2. Objectives are specific, specified in time and measurable

Operationalization:

  1. Do the objectives specify the factors to be changed? (Tip: this was addressed in 5.1)
  2. Do the objectives specify the target group for which the intended objective is to be achieved?
  3. Do the objectives specify the intended magnitude of the effects (e.g. a 10% reduction)?
  4. Do the objectives specify the time within which the objectives are to be achieved?
Norms:
  1. Weak: Question 1 and/or 2 = no
  2. Moderate: Question 1 = yes, Question 2 = yes, Question 3 = no and Question 4 = no
  3. Strong: Question 1 = yes, Question 2 = yes and Questions 3 and/or 4 = yes

 

The instrument is mainly intended and used in HP practice as a planning tool and as an (internal) assessment tool, to assess one's own projects. However, since we were interested in the agreement between assessors, but practitioners work on different projects, Preffi was in this research used as an external quality assessment instrument: to assess other people's projects.

A preliminary test of the feasibility of a draft 2.0 version showed that users appreciated the specificity of the operationalizations and norms [6, 13, 14]. The test also provided some indications of the instrument's reliability, as well as suggestions for improving the descriptions in Preffi, and allowed us to design a follow-up study into the reliability by providing indications for aspects like the number of assessors needed and the need to include a variety of projects, write a clear manual and train Preffi users. The present paper reports on the design and results of this follow-up study, concentrating on the reliability of assessing other people's HP projects with or without the use of Preffi, and on comparisons between the assessments made by practitioners using Preffi and those made by experts assessing the same projects.

The present study aims to answer a number of research questions. (1) Do quality assessments of HP projects by different experts show sufficient agreement to be used as a criterion to evaluate Preffi-based assessments of the same projects by practitioners? We hypothesized that experts, because of their experience as professional assessors on assessment committees, would share the same quality standards and hence show greater agreement in their assessment of projects (i.e. be more reliable) than practitioners. (2) Are the practitioners' Preffi-based assessments more reliable than their intuitive assessments of the same projects? (3) How reliable are the Preffi-based assessments and how many assessors are required for a sufficiently reliable and accurate assessment? (4) Do the experts' greater experience and clearer standards lead to stricter and, therefore, more negative judgements than those of the practitioners? (5) Are Preffi-based assessments by practitioners stricter than their intuitive assessments of the same projects? The Preffi criteria were deliberately made rather strict, with the aim of stimulating practitioners to adopt a critical attitude and to identify aspects of the projects that could be improved. (6) Are there notable aspects in the project scoring and how useful is a consensus meeting in the process of achieving consensus in the project quality rating?


    Methods
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion and conclusions
 References
 
The present study examined the possible differences in quality assessments of 20 projects in three different conditions: (i) experts giving an intuitive judgement based on their expertise and experience, without any further guidelines; (ii) practitioners giving an intuitive judgement and (iii) the same practitioners assessing the same projects using Preffi 2.0. Before their Preffi-based assessment, the practitioners attended a one-day training session on using Preffi 2.0. After the results of the various assessments had been collected and analysed, they were discussed in a consensus meeting with the experts, the practitioners and the Preffi project team.

Study population
The preliminary test of the draft version of Preffi 2.0 had shown that three assessors would allow the reliability to be determined with sufficient precision or accuracy [14]. We, therefore, decided to use three experts and three practitioners to assess the projects.

The three experts we selected had for many years been professionally involved in the allocation of grants for research, development or implementation projects for HP. Their respective backgrounds were in research, policymaking and HP practice. The experts were familiar with Preffi 1.0 and knew that a new version was being developed; one of them was also familiar with the content of the new version.

Of the three practitioners, one was employed by a municipal health service, one by a mental health and addiction care institute and one by a national health education agency. They had had 6, 4 and 1 years of work experience, respectively. All were university educated, with degrees in, respectively, medical biology, educational theory and HP. The practitioners had some knowledge of Preffi 1.0, and knew that Preffi 2.0 was being developed but were unaware of its content until the training session.

Project selection
Various sources were used to collect 32 project descriptions: entries for the Preffi Award, a national project database and direct contacts with a number of HP or prevention practitioners. The projects related to the various domains of HP practice in the Netherlands (general health, mental health, addiction and home care). All project descriptions had the same structure: background; choice of objectives, target groups and interventions; implementation; organization and evaluation. Two members of the Preffi project team used Preffi 2.0 to assess all projects as good, moderate or poor. From these 32 projects, we ultimately selected 20, based on three criteria: (i) the availability of a sufficiently informative project description, allowing a large number of Preffi criteria to be scored; (ii) sufficient coverage of the various HP domains and themes and (iii) sufficient numbers of good, moderate and poor projects, so that they would yield enough variance to allow a realistic reliability assessment.

Measurement instruments
Intuitive assessments by experts and practitioners
The practitioners and experts were invited to assess the projects by reading the 20 project descriptions, which were offered in a different order to each of the three assessors, and to fill in a scoring form for each project. This scoring form asked for intuitive marks between 1 (low quality) and 10 (high quality) to be given for the eight general aspects of each project (problem analysis, determinants, target group, objectives, intervention development, implementation, evaluation and contextual conditions) as well as an overall mark for the project as a whole. Since these aspects correspond to the clusters used in Preffi, they will be referred to below as intuitive cluster assessment. In addition, the assessors were asked to indicate how much time they required to assess a project and whether and to what extent they had already been familiar with each project and whether this had influenced their assessment in a negative or positive sense.

Preffi assessments by practitioners
After the intuitive assessments, each of the practitioners assessed an example project with the Preffi score form. The practitioners attended a 6-hour training session in which the Preffi 2.0 was discussed in depth and the assessment of the example project and difficulties with the scoring procedure were discussed, such as how to deal with information that was lacking. Next, the practitioners assessed the 20 projects once more, this time using Preffi 2.0. Once again, the projects were offered in different order to the three participants. They were asked to indicate how much time they required to assess a project with the Preffi.

Preffi has three levels of scores.

(i) Criterion score: each of the 39 quality criteria has been operationalized as one or more yes/no questions, which together yield a rating of ‘weak’, ‘moderate’ or ‘strong’ for that particular criterion (see Box 1 for an example). In addition, 12 of the criteria allow users to select the option ‘not assessable’.
(ii) Cluster score: each cluster is given a report mark between 1 and 10, which we recommended to be based on the scores (‘weak’, ‘moderate’ and ‘strong’) for the criteria in that cluster. Thus, a cluster with three criteria could be given a 9 or 10 mark if all three criteria had been assigned a ‘strong’ score.
(iii) Project score: the project as a whole is given a score between 1 and 10, comparable to the intuitive assessment. It is suggested that the project score is the (round off) mean of the eight cluster scores.
Practitioners were told that they could deviate from the general rule for calculating cluster and project scores, when they wanted to give greater importance to certain criteria or clusters.

In the consensus meeting, first the scoring of the projects was discussed. Notable aspects of project scoring included the time required to assess a project, the influence of familiarity with projects, the use of the ‘not assessable’ option and the use of the opportunity of the practitioners to deviate from the calculation rule for cluster scores. Then, there was a critical dialogue between all the assessors to reach consensus on the different scores on the projects.

Data analysis
The reliability (Research Questions 1, 2 and 3) was assessed using generalizability theory [15]. Cronbach's {alpha} could not be used as a reliability estimate as both raters and items may contribute to the measurement error. While Cronbach's {alpha} is only applicable in situations where there is only one source of measurement error, generalizability theory accommodates complex measurement designs with more sources of error.

Generalizability theory tries to assess the influence of the various sources of error in a measurement. In an ideal situation, the differences in scores can be attributed to differences between the objects assessed, in this case the projects, but they may also be caused by various error sources, like different views of the assessors or the interaction between projects and assessors. The variance components were estimated with the Varcomp procedure of SPSS by means of the restricted maximum-likelihood estimation method [16].

These were then used to calculate the generalizability coefficient (G) and the standard error of measurement (SEM). G is very similar to Cronbach's {alpha} and is interpreted the same way. SEM can be used to compute a confidence interval for a score. The conventional minimum reliability threshold for reliability coeffficients like G or {alpha} is 0.70 [17]. There is no generally accepted maximum value for SEM. A reasonable requirement would seem to be that the confidence interval of the score remains within the rounding zone for the score category, that is, half a point above or below the category. In other words, the confidence interval around a score of 2 does not include scores <1.5 or >2.5. To achieve this level of precision, SEM needs to be <0.26. Subsequently, we estimated the values for G and SEM for a single assessor and calculated the minimum number of assessors needed to achieve these minimum and maximum values.

G and SEM were computed on different levels of aggregation: for individual Preffi criteria, for the clusters of criteria and for all the criteria. G and SEM were also computed for the intuitive cluster assessments, the cluster scores and the project score. The criteria and clusters are regarded as exhaustive and hence as a fixed facet. If individual criteria or (intuitive) cluster/project scores are analysed, there are three variance components: projects, raters and the interaction of raters and projects. If several criteria/clusters are analysed, there is also a criterion/cluster variance component as well as the interactions of criteria/clusters with projects and raters. As we considered the criteria and cluster facets as fixed, its main effect was not taken into consideration when computing G and SEM.

Apart from calculating the actual values of G and SEM, generalizability theory can also be used to estimate these coefficients for alternative measurement design, e.g. more or less raters. This is analogous to the application of the Spearman Brown formula in classical test theory. We computed the minimum number of raters necessary to achieve adequate reliability.

Research Questions 4 and 5 were analysed with t-test.


    Results
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion and conclusions
 References
 
Reliability and accuracy of the intuitive assessments (Research Question 1)
Combining all intuitive cluster assessments by the three experts yielded acceptable values for the generalizability coefficient and the SEM (see Table I). The minimum value for G (0.70) and the maximum value for SEM score (0.26) were not achieved using one assessor (G = 0.65; SEM = 0.45, data not shown in Table I).


View this table:
[in this window]
[in a new window]
 
Table I. Reliability (G) and accuracy (SEM) values of assessment scores of three assessors in three conditions

 
At the level of individual clusters, the experts did not obtain any G values above the minimum (0.70) or SEM values <0.26 for any of the aspects.

The experts showed lower levels of agreement (G) and precision (higher SEM values) than practitioners for nearly all clusters.

Reliability of Preffi-based assessments compared with intuitive assessments (Research Question 2)
As was explained in Methods, Preffi scores for clusters and projects in this section are computed at two levels of aggregation: as report marks (no aggregation) and as aggregated criteria scores. When only non-aggregated scores are considered (middle columns of Table I), reliability is generally higher and more often acceptable (G > 0.70) when Preffi is not used than when it is used. This is true for scores on entire projects and for specific clusters, and also when cluster scores are combined. Accuracy is unacceptable (SEM > 0.26) in all these cases, except when practitioners' intuitive cluster scores are combined. When aggregated Preffi scores are considered (right column of Table I), reliability and accuracy values of Preffi cluster scores are comparable to practitioners' intuitive (non-Preffi) cluster scores. In fact, the aggregated Preffi scores are reliable for seven out of eight clusters, one more than the intuitive non-Preffi-based scores.

Reliability and accuracy of Preffi-based assessments (Research Question 3)
Table II shows the reliability and accuracy of the assessments by the three practitioners across the 20 projects. It shows the relative size of the variance estimates for each source of variance (project, assessor and interaction effect between project and assessor) for each criterion and cluster and for all the Preffi criteria together. ‘Project’ as a source of variance is an estimate of the true score variance, which ideally should be as high as possible. The projects showed large enough differences for most criteria. Four of the five criteria which produced little or no variance between the projects were frequently given a ‘not assessable’ score.

The variance attributable to differences in views between the assessors should be as low as possible, since this is a source of error. A number of criteria yielded low values in this respect, such as the criteria of ‘theoretical model’, ‘objectives fitting in with the analysis’, ‘timing of the intervention’ and ‘effective techniques’, which may indicate that differences in the amount of experience might play a role in assessing these aspects.

The G and SEM values in this table for all the Preffi criteria together and at cluster level have also been included in the column headed Preffi Criteria Scores in Table I. A combined analysis for the criteria per cluster yielded acceptable reliability (G) and accuracy (SEM) values for all clusters except that for target group, indicating that, at this level, Preffi is sufficiently reliable and accurate using three assessors.

Table II also shows for each criterion the number of assessors needed to obtain an acceptable SEM, that is, a sufficiently accurate assessment. This number ranges from 4 to 12 for the various criteria, with an average of 6.36 assessors per criterion.

Strictness of assessments: level of scores (Research Questions 4 and 5)
As Table III shows, mean project scores of the experts were significantly lower than those of the practitioners (5.6 versus 6.2). Also, at cluster level, experts had lower mean scores than practitioners, but the difference was only significant for the cluster intervention development.


View this table:
[in this window]
[in a new window]
 
Table III. Average overall and cluster scores (range 1–10) for each of the test conditions

 
The mean project score of practitioners was just as high with or without using Preffi. Cluster scores did differ, but not in any particular positive or negative direction, and differences were not significant.

Notable aspects of project scoring (Research Question 6)
The time required for Preffi-based assessment was initially about twice as long as for an intuitive assessment. After the assessors has assessed ~10 projects with Preffi, this time difference had disappeared, with both types of assessment requiring an average of 50 min per project.

Prior knowledge about a project could have a clear impact on the assessment scoring. The respondents indicated that prior knowledge played a major role in the project score in 10% of all projects. In half of these cases, prior knowledge led to a higher score, and in the other half to a lower score. The consensus meeting confirmed that some of the striking deviations in scores were based on this phenomenon.

The practitioners differed in the extent to which they used the ‘not assessable’ option in assessing projects with Preffi. This option can be used, for instance, on items relating to aspects of project management and implementation and evaluation. These aspects are often difficult to assess on the basis of project descriptions. Although the training sessions had paid considerable attention to scoring these aspects, the three assessors differed in their use of this option, with assessors with greater experience using it less frequently than those with less experience.

Preffi suggests a scoring method to calculate cluster scores from criteria scores. The practitioners were allowed to deviate from this scoring method and to attach differential weights to criteria, according to perceived importance of criteria. Practitioners' comments in the consensus meeting and additional analyses (data not shown) indicated that practitioners indeed made use of this opportunity.

Consensus meeting (Research Question 6)
The consensus meeting was used to discuss the assessments and the participants' experiences. They exchanged arguments for the various scores, with the intention of achieving consensus. The following experiences and insights emerged.

The participants had often found it hard to assess a project on the basis of a project description alone. They would have preferred to talk with the project leader to get more information on aspects not included in the project description. They also reported that having prior knowledge about a project made it easier to assess it. In addition, the practitioners preferred the Preffi-based assessment over the intuitive assessment, as they felt it produced a more balanced assessment and provided them with a clear instrument to assess and improve projects.

The participants were aware that people have their own assessment strategies and may attach greater importance to certain aspects than others, often based on their own profession, background and experience. One practitioner may feel more strictly bound to use the Preffi operationalizations than another, who may prefer to include the project's intentions in his or her assessment.

One of the experts emphasized that assessment processes in groups tend to seek consensus on the better projects [18]. Comments on shortcomings in a project quickly result in the project being dropped.

The interpersonal differences we found were deemed to be unavoidable and were even regarded as a positive aspect, since they may help the group achieve a balanced judgement. All participants felt that consensus meetings ought to be a regular feature of the process used to arrive at a final judgement on projects. The discussions during the meeting led to rapid agreement on all projects.

In most cases, the other participants eventually tended to agree with the expert whose assessments differed most from those of the group. This expert's arguments focused on applying the right theory and/or achieving a good fit between the intervention and its intended objective. Even though the participants felt that it was important to use various perspectives, content-oriented arguments proved most decisive at this meeting. It was clear that less experienced practitioners had in some cases been swayed by well-written project descriptions. Research in other occupations has also shown that experience and expert opinions play important roles in assessments [19].

The meeting clearly showed that assessing projects is not a simple matter, and one involving many aspects. Nevertheless, there was a clear consensus that Preffi helps to identify weak aspects of a project and indicates points that could be improved.


    Discussion and conclusions
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion and conclusions
 References
 
This study assumed that expert assessments could be used as an external criterion with which the reliability of Preffi-based assessments could be compared. The findings show that, as far as experts are concerned, at least three assessors are needed to achieve reliable and accurate assessments. Agreement between the experts was weaker than between the practitioners. The intuitive assessments by practitioners and those based on Preffi criteria produced considerably more reliable and accurate project assessments, which more than met the required values of G and SEM defined in advance. The practitioners preferred the Preffi-based assessment.

The study showed that project assessment involves a multitude of aspects, some of which are revealed by the analyses of the scores. The practitioners attached different weights to the various Preffi criteria that have to be combined in a cluster score. Aspects like prior knowledge about a project also play a major part in project assessments and could have a positive or a negative influence.

The respondents found it hard to assess a project purely on the basis of a project description, as descriptions often lack certain information. Preffi criteria requiring an assessment of project management aspects, such as the appropriateness of the supplier (7.1.c), adequate support (1.1), capacity (1.2), acceptable and realistic objectives (5.3, 5.4), tailoring to the ‘culture’ of the target group (6.3b) and practical feasibility (6.5), are particularly hard to assess. This finding fits in with those of an earlier study of the draft version of Preffi 2.0 [13,14]. Hence, we would recommend including a discussion with the project leader in the assessment procedure.

The consensus meeting revealed the importance of a discussion between assessors to achieve consensus and a final assessment. Consensus was achieved without great problems, and the participants recognized that a project could be looked at from different perspectives. This should not be regarded as a ‘source of error’, as the generalizability theory would have it: the participants regarded different perspectives among assessors not only as useful but also as essential. The meeting showed how complex an adequate assessment can be. The expert whose opinions deviated most from those of the others actually managed to sway most of the others. This also indicated that Preffi includes a number of items [e.g. theory (3.1), determinants (3.2), the fit between analysis, objective and intervention (5.1 and 6.1a) and coherence (6.6)] that require more than an assessment by practitioners. They also require an expert opinion to assess whether the justifications given are correct; the least-experienced practitioner found it hard to see through the favourable project descriptions.

As expected, the experts gave stricter assessments of projects than practitioners. Our hypothesis that the use of the strictly worded Preffi criteria would result in practitioners also giving more negative judgements than in their intuitive assessments was not confirmed. Although the practitioners themselves thought they had been stricter when using Preffi, the empirical data did not bear this out (data not shown).

Our findings show that the practitioners' Preffi-based cluster scores (report marks) resulted in a low G value and a high SEM value, which were much poorer than their intuitive assessments of the same aspects. A possible explanation is that Preffi induces people to examine certain aspects more specifically, leading to more differentiated assessments with greater variation. This effect is further increased by the different weights attached to the criteria scores used to calculate cluster scores. This means that the calculation of cluster scores from criteria scores does not contribute to the instrument's reliability.

An important underlying question of the present study is that of the reliability of assessments based on Preffi 2.0. As regards the overall assessment of a project, 2 assessors should be enough to produce a sufficiently reliable and accurate judgement, while 3 assessors would be required at the cluster level and between 4 and 12 assessors (average 6.36) would be needed at the criterion level.

A sufficiently accurate assessment of a criterion leads to an acceptable G coefficient (>0.70) for 17 of the 39 criteria. Although this is not yet enough, the findings of the present study are better than those for the draft version. The present version includes 29 criteria that require fewer assessors to achieve an SEM < 0.26 than the draft version. The average number of assessors needed per criterion has dropped from 8.13 for the draft version to 6.36 for the present version. This may have been caused by improvements made to the draft version (since all revised criteria produced better scores), by the training provided to the assessors or by the fact that the present study used a larger number of projects, allowing the respondents more practice with the instrument.

Our overall conclusion is that the reliability and accuracy of Preffi 2.0 are not yet sufficient to allow a project to be assessed by one assessor. In fact, the participants of the consensus meeting, and we ourselves, question whether this would ultimately be desirable.

Certain improvements can be introduced to increase Preffi's reliability. Firstly, assessors should discuss their assessments in a consensus meeting. Secondly, a consultation with the project leader should be included in the assessment procedure in order to supply information that was lacking in the project description, especially on project management aspects. Finally, adequate scoring of some Preffi criteria requires an expert's opinion. This aspect cannot be adequately addressed in a list of general guidelines like Preffi; it means that the general insights produced by Preffi will have to be specified for certain themes. Such studies can then indicate the relevant theories for the specific domain, the suitable interventions and the specific success and failure factors. In the future, expert systems on the Internet, linked to a digital version of Preffi, could provide this.

Further research will have to show whether this will actually improve the reliability of the Preffi instrument.


    References
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion and conclusions
 References
 
1. Molleman GRM, van Driel W, Keijsers JFEM. Preventie Effectiviteits-instrument, PREFFI 1.0. Ontwikkeling van een effectiviteitsinstrument voor de gvo/preventiepraktijk. Utrecht, the Netherlands: Landelijk Centrum GVO 1995.

2. Molleman GRM. Implementing the Preffi: the use of guidelines for practitioners in the Netherlands. In Norheim L and Waller M (Eds.). Best Practices, a Selection of Papers on Quality and Effectiveness in Health Promotion. Helsinki, Finland: Finnish Centre for Health Promotion 1999 pp. 219–230.

3. Molleman GRM, Peters LWH, Hommels LM, Ploeg MA. Assessment Package; Health Promotion Effect Management Instrument Preffi 2.0. Woerden, the Netherlands: NIGZ 2003.

4. Peters LWH, Molleman GRM, Hommels LM, Ploeg MA, Hosman CMH, Llopis E. Explanatory Guide Preffi 2.0. Woerden, the Netherlands: NIGZ 2003.

5. Keijsers JFEM and Saan JAM. The development of two instruments to measure the quality of health promotion interventions. In Davies J and Macdonald G (Eds.). Quality, Evidence and Effectiveness in Health Promotion. London: Routledge 1998 pp. 117–29.

6. Molleman GRM, Ploeg MA, Hosman CMH, Peters LWH. Preffi 2.0: un outil néerlandais pour analyser l'efficacité des interventions en promotion de santé. Promot Educ 200422–7.

7. Van den Broucke S, Molleman GRM, Broesskamp-Stone U, Speller V, Saan JAM. ‘Getting evidence into practice’: tools and processes for health promotion, special session of the GPHPE/European. Health 2004. Melbourne: IUHPE 2004.

8. Van den Broucke S. Practice built on evidence: guidelines and quality tools for health promotion. Best Practice for Better Health; 6th IUHPE European Conference on the Effectiveness and Quality of Health Promotion. Stockholm: FHI 2005.

9. Kok HH, Vermeulen TRN. Quality Assurance Tools Database. NIGZ/GEP-Project. Available at: http://subsites.nigz.nl/systeem3/site2/index.cfm?fuseaction=Pages.showPages&code=337. Downloaded on September26, 2005.

10. Green LW and Kreuter MW. Health Promotion and Planning: An Educational and Ecological Approach. Mountain View, CA: Mayfield 1999.

11. Bartholomew LK, Parcel GS, Kok G, Gottlieb NH. Intervention Mapping: Designing Theory- and Evidence-Based Health Promotion Programs. Mountain View, CA: Mayfield 2001.

12. Molleman GRM and Hosman CMH. Ontwikkeling van een kwaliteitsinstrument voor de effectiviteit van gvo/preventie-programma's; de Preffi 1.0, ontwikkeling en ervaringen en uitgangspunten voor een Preffi 2.0. TSG, Tijdschrift voor gezondheidswetenschappen 2003 81:238–46.

13. Meurs LHv. Concept Preffi 2.0; Reliability and Usefulness, Research Paper (in Dutch). NIGZ, Woerden, the Netherlands, 2002.

14. Molleman GRM. Preffi 2.0: Health Promotion Effect Management Instrument; Development, Validity, Reliability and Usability. Woerden, the Netherlands: NIGZ 2005.

15. Shavelson RJ and Webb NM. Generalizability Theory: A Primer. Newbury Park, CA: Sage Publications 1991.

16. Searle SR, Casella G, McCulloch CE. Variance Components. New York: John Wiley 1992.

17. Nunnally JC. Psychometric Theory. New York: McGrawHill 1967.

18. Vries Nd. Groepsbeslissingen. In Koele P and van der Pligt J (Eds.). Beslissen en beoordelen. Amsterdam: Boom 1993 pp. 251–84.

19. Wagenaar WA. Logisch voorwaarts en intelligent achterwaarts; modellen voor het stellen van diagnoses in de hulpverlening. In van der Ploeg JD and van den Berg PM (Eds.). Besluitvorming en jeugdhulpverlening. Leuven, Belgium: Acco 1987 pp. 9–18.

Received on November 29, 2004; accepted on August 22, 2005


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Occup. Environ. Med.Home page
E Koppelaar, J J Knibbe, H S Miedema, and A Burdorf
Determinants of implementation of primary preventive interventions on patient handling in healthcare: a systematic review
Occup. Environ. Med., June 1, 2009; 66(6): 353 - 360.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
21/2/219    most recent
cyh058v1
Right arrow E-letters: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when E-letters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Molleman, G. R M
Right arrow Articles by Oosterveld, P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Molleman, G. R M
Right arrow Articles by Oosterveld, P.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?