VIII Analysis of data
Interpreting and Using School FSA Results:
A guide for schools in the Richmond School District
The purpose of this section of the handbook is to suggest ways to approach the interpretation and use of Foundation Skills Assessment (FSA) data at the school level in school planning processes. Some of this information has been drawn from the Ministries booklet as well as the web site found at www.bced.gov.bc.ca/assessment/fsa/interpret.htm
There are two sections to this discussion: Preliminary Comments, and Interpretation Questions. The preliminary comments are intended to provide a foundation of basic understandings about the FSA assessment itself and the FSA data provided to schools by the Ministry. The interpretation questions suggest specific ways to examine FSA data.
Preliminary Comments
The FSA data are valid and reliable in themselves, but provide only a single snapshot of information within the broader array of information available to schools from their own ongoing assessment. The results should not be interpreted in isolation from this other data, and should not be seen as superior or more reliable. On the other hand, FSA data can provide insights that school data alone could not.
Interpretation of Results
Meaning is not inherent in the data itself, but is something that we create together through interpretation. To prepare data for interpretation it must be organized. At this point it might be called information. When organizing data, decisions have to made about how it will be displayed. These decisions either emphasize or obscure aspects of the data.
For example, the Ministry year-to-year summary of a school’s FSA data shows the percentage of students meeting or exceeding expectations, but not the difference between that score and the district or provincial average. The advantage of this choice is that it de-emphasizes comparison and focuses on the schools results. However, since the provincial average varies quite significantly from year to year depending on the testing instrument, some information is lost in this display. To make up for this, the summary shows the “cumulative proportion” of students meeting or exceeding expectations over a period of several years and compares that to the provincial average. This corrects for the ups and downs of the test results but obscures trend lines. The graphic display at the bottom of the page
shows those trends but the vertical scale is so compressed that they may not easily be seen. Any method of organizing and displaying the data will have its strengths and its weaknesses. When interpreting data, attention should be paid to such effects. Sometimes, new methods of displaying the data may be required.
Information becomes useful only when it leads to understanding. Understanding results from a process of interpretation that derives meaning from the information. When a group of people looks at a school’s assessment results, it is reasonable to expect that individuals may interpret the same information differently. The goal, however, is to reach a common understanding of the issues (e.g., identifying areas of strength, areas in need of improvement, and steps that should be taken to maintain or improve student performance). Reaching this common understanding requires a willingness on the part of all involved to consider a range of perspectives and interpretations.
In the end, data/information is only as meaningful as the questions we ask about it and the conversations that result. Moreover, the consequences for students will be positive only to the degree that common understandings and commitments result from interpretation of and dialogue about the data. Thus, the dialogue that is stimulated by FSA data, or any other data, is the source of meaning and of benefit for the school, not the data itself.
Contextual Considerations
When reviewing and interpreting FSA results, schools and School Planning Councils should consider local factors that might influence their results. Schools have significant control over some of these factors (e.g., participation rate on FSA, local policy, instructional strategies used) but not others (e.g., diversity of student population, socio-economic variables, population mobility). Consequently, comparison of school results to other schools and district and provincial averages should be done only with caution. A better than average result is not necessarily reason to be satisfied and a poorer than average result is not necessarily an indication of a problem.
Characteristics of FSA Results
The assessment is designed to measure cumulative learning for grade groupings. For example, students use skills they have gained from Kindergarten through the spring of Grade 4 when they complete the Grade 4 assessment. Thus, results at any one grade level are the result of, and have implications for, all preceding grade levels.
FSA results are reported in terms of the number of students “not yet within expectations,” “meeting expectations,” or “exceeding expectations” for that grade level. The level of expectations is set by a reference group of experienced teachers when developing scoring guides. This varies from year to year according to the particular items on the test.
FSA is not designed to be comprehensive or diagnostic, but rather to provide a snapshot of how well students are attaining important foundation skills in relation to provincial standards. FSA results are meant to complement other information gathered in the district, school, or classroom. They should not be considered to be a superior source of information but simply an additional source of information.
Writing results are based on two samples of student work: one extended writing task and one shorter, focused task. Student results should not be interpreted as representing final, polished work. They are a measure of ability to write a draft response to a contrived prompt in a limited amount of time. This is a useful skill and provides valid information about writing ability in such a “test” setting, but it is different from what is normally done in classrooms and from the way writing is normally done in authentic applications in the “real world.”
Significant Differences
No test is an absolutely accurate measure of student ability. If the same test were given to the same students on a different day, the results would always differ to some degree. Therefore, care must be taken in interpreting results, and particularly in comparing two results.
The confidence interval of a test result is determined by mathematical formulae and considers such issues as reliability and sampling. A 90% confidence interval is commonly used. This would indicate that on repeated administrations of the test, similar results would be obtained 9 times out of 10. FSA results are given with a 90% confidence interval indicated; for example, a result of 86% ±10% would mean that the “true” result would lie between 76% and 96% with 90% confidence.
If two results are different but overlap when the confidence intervals are considered then in actual fact there is no statistical significance to the apparent difference. Thus, for example, if a school report shows a proportion of students meeting or exceeding expectations of 78% ± 2% in numeracy for the current year and the previous year results showed a proportion of students meeting or exceeding expectations of 74% ± 3% for the same component, then there is no statistical difference between the two results and they should be considered to be the same.
Even if there is a statistical difference between two results, the educational significance of the difference remains a matter of interpretation. This is not a “yes-no” question. It is a matter of deciding what is the educational significance of the statistical information, and is best done through discussion by an interested and informed group of stakeholders who share their personal understandings, speculations and experiences to arrive a common interpretation and decisions about what actions might be reasonable to take and effective in addressing any concern or opportunity that is revealed through their discussion.
Types of Comparisons That Can Be Made
It is virtually impossible to interpret a single test result. Comparison to another result is almost always required. There are several common types of comparisons.
School results can be compared to other schools and to district and provincial averages. While this may not be the most useful form of comparison, it is inevitable that such comparisons will be made, so should be done thoughtfully and with careful consideration of context. Otherwise, invalid interpretations can easily result.
Another way to look at elementary school results is in terms of internal comparisons between Grade 4 and Grade 7 results. This may indicate the effect that the school program is having on students in the intervening grades. However, one must also recall that the results in a particular year can be strongly influenced by the particular students taking part. Thus, differences may reveal more about the two groups of students than the educational program in the school. Nonetheless, this comparison can yield useful information.
Results at a single grade level can also be broken down into results for subgroups, such as boys and girls, First Nations, ESL or French Immersion students. In comparing the results of subgroups to the whole grade, the same cautions apply as when comparing between grades, particularly when the number of students involved is small. Results for the entire grade level can also be broken down into subtopics.
In all cases, observed differences based on a single year should be treated with great caution because they may not be either statistically or educationally significant. Much greater confidence comes from observing a stable pattern over a period of at least three years.
Participation Rates
In Richmond every effort is made to have all students participate in FSA unless they are absolutely unable to do so because of a mental or physical handicap, or such limited English that they could not understand the exam. No attempt is made to exclude students who we feel are unlikely to do well. Nonetheless, due to the nature of our student body, district participation rates for FSA are generally lower than the provincial average in the elementary grades, but they are higher in Grade 10 for reasons that are yet unclear. This may vary quite significantly between schools depending upon the student demographics.
Interpretation Questions
There is no complete set of standard questions that can or should be asked when looking at FSA data. Interpretation of data is an exploration that can lead in many directions. What follows, however, are some basic questions that should always be considered as a starting point.
1. What results would you have expected to see given your knowledge of the school and its students? Discuss those predictions as a group, making sure to include the reasons for your predictions and not just the predictions themselves. The purpose of this preliminary exercise is to establish an anticipatory mindset that will sensitize you to unexpected aspects of the data, and thus to what it can reveal.
2. Overview the complete set of results in a general way looking for anything that stands out because it is highly unexpected, either much better than expected or much worse than expected. Bearing in mind the local context and the characteristics of the particular year of students that was being assessed, is there anything that seems immediately interesting or noteworthy about the results on the basis of a “once over lightly”?
3. Compare the school, district and provincial results for the current year in terms of the number of students not yet within expectations, meeting expectations and exceeding expectations. Is there a statistically significant difference in any of these? In light of all the contextual factors you are aware of, would you have expected this difference? Bear in mind that local demographics may be significantly different from provincial demographics and even within Richmond there may be significant differences between schools. How does the participation rate for your school compare with district and provincial rates? Have any differences that you notice also been present in previous years? Is there any trend to the overall results or to the differences? (Remember that three years is
the minimum time required to speculate on the existence of a trend and any conclusion must be considered tentative unless it is confirmed over a longer period.)
4. Look at the breakdown of overall results according to subscales. Are there any statistically significant differences between the subscales? If so, has this difference appeared in previous years? (This information is not available on the summary the Ministry provides. You will have to look at the data summary from previous years if there is a difference you wish to explore.)
5. Look at the various parts of the data by gender, ESL, French Immersion (if applicable) and First Nations (if applicable) to see whether or not there is anything in the comparison between general results and the results of these subgroups that seems noteworthy. Is there any difference between the results for boys and girls? Are the ESL results consistent with what one might reasonably expect in an assessment of this population (bearing in mind that only students in the very introductory phases of ESL are excused from the assessment so that many of the ESL students may be working with an emerging command of English)?
6. Look at the percentage of students that are below expectations and the percentage of students that exceed expectations. Does this seem unusual in any way? Is one group significantly larger than the other?
7. In elementary schools compare the results for Grades 4 and 7 in terms of the general shape of the results and also the details, including the relative number of students below expectations and exceeding expectations, the comparative results of boys and girls, and the performance of ESL students. Is there any inconsistency between the performance of students at the two grades?
8. What additional data would you need in order to confirm any theories that have arisen in your discussions about the cause of results or to verify that suggestions for improving them are valid?
Follow-up Discussion
If there appear to be any statistically significant differences or patterns to the data, the next step is to consider what might have caused them, whether or not they are also educationally significant, and what they might reveal. Is the nature of the student population or the community changing? Have there been any changes in curriculum, resources or instructional strategies used in the school? Remember that the results at any grade level are the cumulative consequence of learning in previous grades so discussion must be broader than a single grade level.
Interpretation of FSA data and determining the implications and possibilities for students is the critical part of the review exercise for schools and School Planning Councils. The data does not speak for itself. In fact, as noted earlier, the data itself has no meaning without interpretation. Consequently, the quality of the dialogue is what will determine the quality of the consequences for students.
Dialogue is a complex social process intended to help a group construct common understandings of something. It depends upon a climate of trust, common purpose and willingness to wonder without preconceived conclusions. To be most effective a dialogue must be given sufficient time for ideas to arise, evolve and mature. Haste can make not only waste but also misunderstanding. Dialogue is most effective when it is sustained over a long period of time, which gives ideas a chance to emerge, evolve and mature into shared understandings. Thus, School Planning Council discussions, and discussions in the school community, should not be compressed exclusively into an intense period following release of FSA data but should be ongoing, and should include consideration of other indicators.
It is suggested that School Planning Council interpretation of FSA results be recorded in minutes in sufficient detail that the record can be of use in subsequent years by reminding future SPCs of assumptions, speculations, suggestions and questions from the past.
|