Peabody Fellows Programs
Figure 1. Average baseline vs. post-PD student test scores across numerous Peabody Fellows programs. CLICK TO ENLARGE.
Table 1. Relevant information about each of the Peabody Fellows institutes in Figure 1. CLICK TO ENLARGE.

Peabody Fellows Measures of Success

Student Learning

Although we have ample evidence that teachers enjoy coming to our Peabody Fellows professional development institutes and feel that they receive a great many benefits from doing so, the most important question about any professional development opportunity is whether or not the students benefit. As the data collected by our outside evaluator demonstrate, we have compelling evidence that our programs have resulted in significant student learning.


To examine this, the team at the Yale Peabody Museum, together with the evaluator and classroom teachers, created tests that both covered the specific science content and process skills from the different Peabody Fellows Institutes and were closely tied to the Connecticut State Science Standards. These tests were administered by the participating teachers both at the end of the school year prior to their involvement in the institute (what we call the Baseline condition) and after they taught the lessons in their class drawing from what they had learned and developed in the summer institute. The tests were not graded; they were scored by outside scorers arranged by the Museum. They were also made to be difficult, and there was no expectation that most students would be able to get all answers correct after learning the material.


The comparisons are shown to the right in Figure 1 as the average scores for the same teachers on both tests. In all cases, there was a substantial increase in scores in the year that the institute-developed materials were taught, despite the fact that the teachers had been expected to teach this same content in their classes prior to participation. Given that the increases were found in every case and in almost every case were statistically significant (p<0.05; differences were not significant for IMLS 6.2 because of the small number of participants), we are reasonably certain that the effects are the product of participation in the institutes.


The table to the right (Table 1) provides a brief description of each of the institutes shown on Figure 1 (the only institutes from which the complete set of test data were so far collected and analyzed) as well as the year of the institute, the number of teachers, and a very important measure of the significance of the effect, the effect size. The effect size is calculated by taking the difference between the average scores on the post-test and the baseline test and then dividing that by the combined standard deviation of the two groups. It allows one to look at the number of standard deviations difference between the two groups, which is independent of the scale or difficulty of the test. In general, effect sizes from 0.2 to 0.4 are considered small. 0.4 to 0.8 are considered medium, and 0.8 and above are considered large. Very few educational innovations are able to show large effect size differences.


As can be seen in Table 1, all of our effect sizes are well above 0.8. Clearly, students performed much better on the tests after their teachers participated in the professional development than the students taught by those same teachers prior to the professional development. Since the tests were closely related to the Connecticut State Science Standards, we believe that these findings are strong evidence that the students learned important science content and process skills as a function of their teachers’ participation in the institutes.