Validation of the BRIEF-P teacher version in a Galician ( Spain ) school sample

The preschool version of the Behavior Rating Inventory of Executive Function is a rating scale increasingly used in research and widely used in clinical settings for the assessment of the executive skills. However, the studies conducted on its measurement properties are still scarce, and validation studies outside North-America, almost inexistent. In the present study we analyzed its psychometric features in a sample of 452 preschool children who were rated by their teachers. The results from exploratory factor analysis indicated good adjustment of the clinical scales to the three factor model proposed by the authors of the BRIEF-P, as well as high internal consistency. The obtained raw scores were significantly lower than those reported in the normative sample, so the urge upon the construction of adapted norms to our school population is stressed.

"Executive functions" (EF) is an umbrella term to encompass goal-oriented control functions of central importance in daily life, including aspects such as maintaining and updating information, inhibiting unappropriated processes and flexibly shifting (Miyake et al., 2000).These abilities emerge early in life and continue to develop until adolescence or early adulthood (Romine & Reynolds, 2005).Is well known that executive abilities play an important role in the development of other abilities during childhood, including learning skills and adaptive functioning (Blair & Razza, 2007;McClelland, Cameron, Connor, Farris, Jewkes, & Morrison, 2007;St. Clair-Thompson & Gathercole, 2006).Furthermore, executive impairment is a core feature of several acquired and developmental disorders (Pennington & Ozonoff, 1996).
Classical assessment of executive skills has usually been made by means of laboratory tasks from neuropsychological literature and, for children, different adaptations have been made from adult's tasks (see Carlson, 2005).These tasks have been alleged lack of ecological validity and limited clinical utility, as they capture performance in small ascertainment windows, so cannot adequately capture the cross-temporal nature of EF (Barkley, 2011).However, in the last decade, the use of rating instruments of executive functioning for children and adults has increasingly developed as an alternative way of assessment (Gioia, Espy & Isquith, 2003;Gioia, Isquith, Guy & Kenworthy, 2000;Thorell & Nyberg, 2008).Ratings have the advantage of being cost-effective and capture behavior over an extended period of time, so EF is assessed as it is used in daily life.
Among the available rating scales, the Behavior Rating Inventory of Executive Function (BRIEF) was the first to be developed and also the most widely used in research.Originally was conceived by Gioia and co-workers (Gioia et al., 2000) for assessing of EF in childhood, but since its first publication, three other versions do exist at this time: Self-report (BRIEF-SR, for adolescents from 13 to 18), Adults (BRIEF-A), and Preschoolers ( BRIEF-P), being the last one being the aim of the present study.
The BRIEF-P is a questionnaire for parents and teachers/caregivers of children of pre-school age.It was designed to assess EF difficulties of children from 2 years through 5 years 11 months.The 63 items in the scale describe everyday behaviors of preschoolers as indicators of EF difficulties.Items are arranged into five scales assessing each one different aspects of EF: The Inhibit scale measures the child's level of inhibitory control, the Shift scale measures the ability to switch from a situation or action to another as the circumstances demand, while the Emotional Control scale measures the child's capacity to modulate emotional reactions.The Working Memory scale measures the ability of the child to hold the necessary information in mind to perform ongoing actions or activities until their completion.The Plan/Organize scale assesses the child's ability to manage current and future demands of tasks (e.g.skills such looking forward to future events, organizing information, actions, or materials to achieve a goal).Raw scores of the five clinical scales are computed into three broader indexes: Inhibitory Self-Control, Flexibility, and Emergent Meta-Cognition, and an overall composite score (the Global Executive Composite).
Ratings of executive function in early years have demonstrated not only sensitivity, but also clinical utility to different manifestations of executive dysfunction in clinical conditions.
An arising issue about the use of rating scales is the existence of cross-country differences in the ratings, reflecting not true differences in executive skills, but rather cultural biases about whether the behaviors assessed are considered typical or atypical for the children at this age within a particular cultural environment.As a matter of fact, such kind of differences has been previously found by Thorell et al. (2013) in a cross-cultural study examining parent and teacher ratings using the Childhood Executive Functioning Inventory (CHEXI) across four countries, so that, the urge upon the availability of culturally adapted norms was stressed.Regarding to the BRIEF-P, a previously published study validating the scale in a Catalonian children sample (Bonillo, Araujo Jiménez, Jané Ballabriga et al., 2012) found lower raw mean scores than those reported in the normative American sample and consequently alternative normative measures for that population were proposed.

Present Study
The BRIEF-P is increasingly used in research and widely used in clinical settings.However, the studies conducted on its measurement properties are still scarce, and validation studies outside North-America, almost inexistent.Therefore, the purpose of this study is to comprehensively examine its psychometric properties, measurement model, and the eventual differences in the data with respect to those in the original normative sample.Toward this objective, data from a school sample of Galician preschool children were used, drawn from a larger study on the relation between BRIEF-P teacher ratings and the risk of developing ADHD (Veleiro Vidal, 2011).

Participants
In this study participated 455 preschoolers (4 and 5 years old; 46% girls), from 20 classroom groups belonging to 7 public primary schools in the area of A Coruña (Galicia, NW of Spain).Mean age of the 5 years old sample was 67.71 months (Median: 67.25 months; SD: 3.01 months; Range: 62.00-72.00months), while the mean age of the 4 years old sample was 54.86 months (Median: 54.0 months; SD: 2.85 months; Range: 50.00-59.60months).As the objective of this study was the testing of the instrument in non-clinic school samples, data from three children were dropped because of being affected by neurodevelopmental conditions (two cases of cerebral palsy and one case of Autism Spectrum Disorder), so that the final number of participants was 452.

Instruments
The Behavior Rating Inventory of Executive Function-Preschool Version (BRIEF-P; Gioia et al., 2003) is a 63 items questionnaire, describing everyday behaviors of preschoolers that represent indicators of EF difficulties.All items are scored on a three point Likert scale from 1 (never), 2 (sometimes), to 3 (always), therefore, higher scores indicate greater executive dysfunction.Raw scores are converted into T scores (mean=50, SD=10) or percentiles according to normative tables for boys and girls, and for younger (2-0 to 3-11) and older (4-0 to 5-11) preschoolers.For our study, the teacher form was translated into Spanish and Galician, and finally arranged in a bilingual form.Galician is the local and co-official language spoken in the autonomous region of Galicia in Spain.Both Galician and Spanish are Romanic languages, although there are some important differences between them in terms of wording and grammar.Galician is the official language for academic records and documents in the public school system in Galicia, and also the native language of a great proportion of school teachers, so the development of a Galician version of the BRIEF-P for its use in Galician schools is clearly useful.For our study, two native bilingual professional translators back-translated to English the translations into Galician and Spanish made by the first author of this study.In cases where the back-translation turned out to be different from the originals, a consensus among the three persons was chosen.

Data analyses
The psychometric properties of the BRIEF-P were examined by replicating the earlier analyses in the original normative study: internal consistency of the clinical scales and goodness of fit of the measurement model suggested by the developers of the BRIEF-P.
Cronbach alpha was performed for all the clinical scales and indices.An exploratory factor analysis (EFA) for a three-factor solution with PROMAX rotation was conducted, entering the clinical indexes as the variables to analyze.Additionally, the difference between boys' and girls' scores was tested, as well as, the eventual differences between the Galician scores and those in the original normative study.

Table 1.
Correlations between scales and indexes of the BRIEF-P in the present study and those in the original normative sample (Gioia, Espy & Isquith. 2003

Factor structure of the BRIEF-P
The correlations found between the different scales and indexes of the BRIEF-P are shown in Table 1, appearing under the diagonal line those correlations found by the BRIEF-P authors in their normative study.All scales and indexes were found to be significantly correlated (p < .001),ranging from .37 (rInhibit, Shift) to .89(rWorking Memory, Plan/Organize), so that the maximum recommended limit of intercorrelation (.90) was not exceeded (Tabachnick & Fidell, 1996).
We replicated the analytic strategy used by Gioia et al., and therefore an exploratory factor analysis (EFA) with PROMAX rotation using the five clinical scale scores (instead of the individual items) was performed, exploring a three factor model.The results showed that the eigenvalues of the 2nd and 3rd factors were lower than 1.0.This same pattern had been found in the analyses led by Isquith and colleagues (Isquith, Gioia & Espy, 2004), but the adoption of a three factor model had been selected according to: (a) Theoretical considerations, (b) Adequacy of separation among the variables, and (c) Cumulative percentage of variance accounted for.In our analysis, the 2nd factor accounted for 17% of variance and the 3rd factor accounted for 11% (meanwhile, in the original normative study, the figures were 16% e 11% respectively).
Table 2 shows the factor loadings found in the pattern matrix for the rotated solution, which accounted for 92.55% of the variance (very similar to the 92% in the normative study).Results indicate that the data in our sample do replicate the factor structure proposed by Gioia, Espy e Isquith (2003) in the original BRIEF-P normative study.
Table 2. Loadings in the Pattern Matrix for a three factor solution of the clinical scale scores of the BRIEF-P Notes: N = 452.Rotation method: PROMAX.Factor 1: Emergent Metacognition.Factor 2: Flexibility.Factor 3: Inhibitory Self-Control.Factor loadings lower than 0.3 are not showed.Correlations in parentheses are from the normative data presented by Gioia, Espy & Isquith (2003)

Internal Consistency
Internal consistency measures whether several items, propose to measure the same general construct, produce similar scores.The most used index of internal consistency is Cronbach Alpha and, for the BRIEF-P, Gioia and co-workers (2003) reported high values for the clinical scales (from 0.90 to 0.97) as well as for the indexes (from 0.93 to 0.97).The results obtained with our sample are similar, or slightly lower than the normative ones.Comparisons between alpha coefficients in both samples were performed by means of Feldt's W (Feldt, Woodruff & Salih, 1987) and significant differences were found between coefficients of the Inhibit, Shift and Plan/Organize scales, as well as the Flexibility index and the GEC (see Table 3).Nevertheless, the obtained values of alpha may be considered as good or excellent, except to the Shift scale (0.79) that it is comprised within the rank considered as acceptable (George & Mallery, 2003).

Sex differences
In the normative study, boys had been rated by their teachers as having greater difficulties than girls on the Inhibit, Working Memory, and Plan/Organize scales, and significant differences were found in the Inhibitory Self-Control and Emergent Metacognitive indexes, as well as the Global Executive Composite.So that, separated normative standardized scores were calculated for each gender group.As we can see in Table 4, the comparisons by sex in the present study show that girls have received significant lower scores than boys in all scales indexes (p < .001),being the differences of medium size (Cohen's d from 0.38 to 0.64).

Comparisons between the scores in the current and the normative sample
Table 4 shows the comparisons between the mean scores and standard deviations in our study and those reported in the normative sample.The ratings made by the teachers in the Galician sample are lower than in the sample used for the development of the BRIEF-P.Student's t tests for independent samples, as well as Cohen's d, were performed, finding that significant differences exist in all scales and indexes between both samples.Effect sizes were medium to large (Cohen's d from 0.48 to 1.37).
Gioia and his colleagues established that scores greater than 1.5 SD above the mean were suggestive of dysfunction in each scale, so that, we report in Table 5 the cutoffs calculated from our sample, as well as the original normative sample ones.Raw scores corresponding to percentiles greater than 90 th are also provided.

Discussion
This study aimed to test the psychometric properties of the Spanish/Galician teacher version of the BRIEF-P in a school sample recruited from public schools.We have conducted EFA in order to examine the goodness of fit of the collected ratings to the factor structure proposed by Gioia, Espy e Isquith (2003) in their normative study.We have found that the five scales in the BRIEF-P adequately fit in the three-factor structure originally proposed: Inhibitory Self-Control, Flexibility and Emergent Metacognition.The amount of variance accounted for in our study was similar to that found in the original study, and the internal consistency results were from adequate to high range.(Diamond, 2006).
Rating scales assume that every respondent share an understanding with regard to the nature of the behavior being rated and the meaning of the anchor points provided on the scale for responding.This is a source of measurement error kept in mind in the users' guides of rating scales, but belonging of the assessed persons to the population where the standardized norms were constructed is required.As it was previously stated, important differences may be possible across countries regarding how typical or atypical the EF behaviors are (Thorell et al., 2013), therefore, the establishment of culturally adapted norms it is clearly needed.In our study, we have found that the teachers in our sample rated their children as having significantly lower scores than those in the American normative sample.The sizes of those differences were medium to large and, therefore, caution must be taken when the original norms are used to assess preschoolers in our school population.Alternatively, cutoffs from our sample are provided.These results go in line with those found by Bonillo et al. (2012), suggesting the likelihood that, within the Spanish framework, raters would be less prone to consider as atypical those behaviors expressed by the BRIEF-P items.Some limits of this research can be the following.Although children in our sample were recruited in several public schools, no sampling method was used to ensure that is a representative sample of the Galician (or Spanish) school population.
Parent rating scales were not available in this study, although teacher's ratings have proved to have more predictive power than those from parents (see Thorell et al., 2013); comprehensive assessment of EF difficulties must employ both ratings, as well as other sources of information.

Table 3 .
Internal consistency (Cronbach's α) coefficients for the BRIEF-P in the study sample and the normative sample

Table 4
Average scores and standard deviations of the BRIEF-P in our sample and in the original sample Student t test for independent samples.Unequal variances were assumed as the Levene's test for variances showed statistically significant differences (p<0.001) for all comparisons; b Cohen's d

Table 5 .
Gioia et al. (2003)utoffs found in the present study and from the normative sample byGioia et al. (2003)