Beware of testing too many hypotheses; the more you torture the data, the more likely they are to confess, but confession obtained under duress may not be admissible in the court of scientific opinion.
—Stigler (as cited in Mark & Gamble, 2009, p. 210)
Misinterpretation and abuse of statistical tests, confidence intervals, and statistical power have been decried for decades, yet remain rampant.
—Greenland, Senn, Rothman, Carlin, Poole, Goodman, & Altman, 2016, p. 337
In This Chapter
Common types of statistics used for quantitative data analysis are defined, along with methods for choosing among them. Computer software for quantitative analysis are discussed.
Interpretation issues relevant to quantitative data analysis are discussed, including randomization, sample size, statistical versus practical significance, cultural bias, generalizability, and options for reporting quantitative results, such as effect sizes and variance accounted for, replication, use of nonparametric statistics, exploration of competing explanations, recognition of a study’s limitations, and a principled discovery strategy.
Effect sizes as a part of statistical synthesis (i.e., meta-analysis) as a literature review method are explained.
Options for qualitative analysis are described, along with selected computer programs that are available.
Interpretation issues related to qualitative data analysis are discussed, including use of triangulated data, audits, cultural bias, and generalization of results.
Mixed methods analysis and interpretation issues are addressed.
Development of a management plan for conducting a research study is described as a tool to be included in the research proposal.
Writing research reports is described in terms of dissertation and thesis requirements, alternative reporting formats (including performance), and publication issues. Digital reporting and dissemination strategies are discussed.
Strategies are discussed for improving the probability of the utilization of your research results.
Strategies are discussed for improving the probability of the utilization of your research results.
By reading and studying this book, you have moved through the steps of preparing a research proposal or critiquing a research study to the point of data analysis. If you are preparing a research proposal, your next step is to describe the data analysis strategies that you plan to use. In most research proposals, this section is followed by a management plan that specifies what tasks you will complete within a specified time frame and what resources will be required to complete the research project. Then, you would be in a position to complete the research study itself and to write up the results. Thus, the organizing framework for this chapter is designed to take you through the data analysis and interpretation decisions, the design of a management plan, and ideas concerning writing and disseminating research. If your goal is to critique research (rather than conduct it yourself), you will find guidelines that will help you identify the strengths and weaknesses of this portion of a research study.
A final section addresses the utilization of research results. Although this section appears at the end of this text, ideas to enhance utilization have been integrated throughout the descriptions of the research planning process in this text. If you wait until after the research is finished to consider utilization, chances are that your research could become a “dust catcher” on someone’s shelf, an unused computer file, or an unvisited website. That would not be a happy ending after all your work, so it is important to build in strategies for utilization during your planning process.
Quantitative Analysis Strategies
Will struggling first-grade readers who participate in the Reading Recovery program make greater gains in their reading achievement than struggling readers who do not participate in that program (Sirinides, Gray, & May 2018)? How do experiences of discrimination relate to thoughts of dropping out among Latina/o students (McWhirter, Garcia, & Bines, 2018)? These are the types of questions for which researchers use quantitative research methods to investigate. Brief descriptions of two studies that explored answers to these questions are provided in Box 13.1. The analytic and interpretive strategies used in these studies are provided as examples of the various concepts described in this section of the chapter.
Commonly Used Quantitative Data Analysis Techniques
It is not possible to explain all the different types of statistics, the derivation of their formulas, and their appropriate uses in this chapter. The reader is referred to general statistics books for more specific information on this topic (see, e.g., Carlson, & Winquist, 2017; Field, 2017; Vogt, Vogt, Gardner, & Haeffele, 2014). First, I take you on a little side trip into scales of measurement territory, as the scale of measurement has implications for decisions about which statistic to us. Second, I discuss computer software programs used for quantitative data analysis and define and give examples of some of the more commonly used quantitative data analysis techniques. Then, I provide you with a model to aid you in making decisions about the most appropriate data analysis techniques. Finally, I discuss issues related to the interpretation of quantitative data analysis results.
Scale of Measurement
Before presenting definitions of commonly used statistics and explaining in detail the decision strategy for choosing a statistical procedure, I wish to digress for a moment to describe one concept on which the basis for choosing a statistical procedure rests—the scale of measurement. As a researcher you need to ask: What is the scale of measurement for the data for both the independent and dependent variables?
Box 13.1 Brief Descriptions of Two Quantitative Studies
Study 1: The Impacts of Reading Recovery at scale: Results from the 4-year i3 external evaluation (Sirindes et al., 2018)
The researchers wanted to test the effectiveness of Reading Recovery, a pull-out program that provides daily 30-minute one-to-one instruction that is in addition to the regular classroom instruction. They compared students in the Reading Recovery condition with students who received reading instruction “as usual” with supplemental supports. “Means in the treatment group are one third to one half of a standard deviation larger than the control group means (p. 13).” The authors conclude that this is evidence that Reading Recovery is “an effective intervention that can help reverse struggling readers’ trajectories of low literacy” (p. 16).
Study 2: Discrimination and other education barriers, school connectedness, and thoughts of dropping out among Latina/o students (McWhirter, Garcia, & Bines, 2018)
The researchers studied the relationship between Latina/o adolescents’ experiences with discrimination and their thoughts of dropping out of high school. Experience with discrimination was measured by rating 16 experiences that reflect discrimination (e.g., teachers think you are less smart) using a scale from 1 = never to 5 = daily. Thoughts of dropping out of school were measured with two items (e.g., I might drop out of school) using a scale from 1 = not at all true to 5 = very true. The researchers reported a significant relationship between experiences of discrimination and thoughts of dropping out of school. However, they also noted that the effect of discrimination could be reduced by enhancing a sense of school connectedness.
The four scales of measurement are defined and examples of each are provided in Table 13.1. The scale of measurement is important because it determines which type of statistical procedure is appropriate. As you will see later, this has an influence on deciding between parametric or nonparametric statistics as well as on the appropriate choice of correlation coefficient.
The choice of a statistical procedure is outlined in Table 13.2. Your choice will depend on the following factors:
Your research question, which can be descriptive, concerns the extent of relationships between variables, determines significance of group differences, makes predictions of group membership, or examines the structure of variables
The type of groups that you have (i.e., independent, dependent, repeated measures, matched groups, randomized blocks, or mixed groups)
The number of independent and dependent variables you have
The scale of measurement
Your ability to satisfy the assumptions underlying the use of parametric statistics
Each type of research question leads you to a different statistical choice; thus, this is the most important starting point for your decisions.
You are almost ready to dive into the different types of statistics, but I suggest that before jumping into complex statistical analysis, it is important to really understand what your data look like. Statisticians recommend that you always graph your data before you start conducting analyses. This will help you in several respects. First, you will be closer to your data and know them better in terms of what they are capable of telling you. Second, they will help you determine if you have met the appropriate assumptions required for different types of analyses. Third, you will be able to see if you have any “outliers”—that is, values for variables that are very different from the general group response on your measure.
Table 13.1 Scales of Measurement
Scale of Measurement
Definition
Example
Nominal
Categorical data
Color: red, green, blue
Label: male, female
Ordinal
Ranked data organized according to increasing or decreasing presence of a characteristic
Tallest to shortest
Sweetest to sourest
Heaviest to lightest
Interval
Equal intervals, but zero is arbitrary
Temperature
Ratio
Equal intervals, and zero is defined as meaning the absence of the characteristic
Weight, age, IQ, many personality and educational tests
Computers and Quantitative Analysis
It would be highly unusual for researchers to analyze their quantitative data by hand; there are many statistical packages that are available for this purpose. I grew up using SPSS; current versions are very intuitive (as opposed to versions from the 1960s). Many students will make their decision about the software package to use based on what their university supports. There are other criteria to consider in making this decision, such as the comprehensiveness of the package, the cost, the learning curve required, the training provided by the developer, the platform it runs on (Mac or Windows), and the types of data that can be analyzed. If you do a Google search for the top 10 statistical softwares, you will get thousands of hits. The top 10 software packages are listed as SPSS, SAS, Stata, Minitab, JMP, NCSS, SYSTAT, ANALYSE-IT, PSPP, and MedCalc. If you search for the top 10 free statistical packages, you get this list: SAS University Edition, GNU PSPP, Statistical Lab, Shogun, DataMelt, GNU Octave, Zelig, Develve, Dataplot, and SOFA Statistics. As each of these packages has evolved to be more user friendly, the processes for doing various analyses have become more intutitive. Most software packages provide video training at their websites with opportunities to practice with the software before purchase or download.
Statistics can be thought of as being descriptive (i.e., they describe characteristics of your sample), correlational (i.e., they describe the strength and direction of relationships), and inferential (i.e., they allow you to make group comparisons). Box 13.2 provides definitions of the most commonly used descriptive, correlational, and inferential statistics.
Box 13.2 Definitions of Commonly Used Statistics
Descriptive Statistics: Statistics whose function it is to describe or indicate several characteristics common to the entire sample. Descriptive statistics summarize data on a single variable (e.g., mean, median, mode, standard deviation).
Measures of Central Tendency
Mean: The mean is a summary of a set of numbers in terms of centrality; it is what we commonly think of as the arithmetic average. In graphic terms, it is the point in a distribution around which the sum of deviations (from the mean point) is zero. It is calculated by adding up all the scores and dividing by the number of scores. It is usually designated by an X with a bar over it (–X) or the capital letter M.
Median: The median is the midpoint in a distribution of scores. This is a measure of central tendency that is equidistant from low to high; the median is the point at which the same number of scores lies on one side of that point as on the other.
Mode: The mode is a measure of central tendency that is the most frequently occurring score in the distribution.
Measures of Variability
Range: The range is a measure of variability that indicates the total extension of the data; for example, the numbers range from 1 to 10. It gives the idea of the outer limits of the distribution and is unstable with extreme scores.
Standard Deviation: The standard deviation is the measure of variability—that is, the sum of the deviations from the mean squared. It is a useful statistic for interpreting the meaning of a score and for use in more sophisticated statistical analyses. The standard deviation and mean are often reported together in research tables because the standard deviation is an indication of how adequate the mean is as a summary statistic for a set of data.
Variance: The variance is the standard deviation squared and is a statistic used in more sophisticated analyses.
Correlational Statistics: Statistics whose function it is to describe the strength and direction of a relationship between two or more variables.
Simple Correlation Coefficient: The simple correlation coefficient describes the strength and direction of a relationship between two variables. It is designated by the lowercase letter r.
Coefficient of Determination: This statistic is the correlation coefficient squared. It depicts the amount of variance that is accounted for by the explanatory variable in the response variable.
Multiple Regression: If the researcher has several independent (predictor) variables, multiple regression can be used to indicate the amount of variance that all of the predictor variables explain.1
Inferential Statistics: Statistics that are used to determine whether sample scores differ significantly from each other or from population values. Inferential statistics are used to compare differences between groups.
Parametric Statistics: Statistical techniques used for group comparison when the characteristic of interest (e.g., achievement) is normally distributed in the population; randomization is used in sample selection (see Chapter 11) and/or assignment (see Chapter 4), and the interval or ratio-level of measurement is used (e.g., many test scores).
t tests: Inferential statistical tests are used when you have two groups to compare. If the groups are independent (i.e., different people are in each group), the t test for independent samples is used. If two sets of scores are available for the same people (or matched groups), the t test for correlated samples is used.
ANOVA: The analysis of variance is used when you have more than two groups to compare or when you have more than one independent variable.
ANCOVA: The analysis of covariance is similar to the ANOVA, except that it allows you to control for the influence of an independent variable (often some background characteristic) that may vary between your groups before the treatment is introduced.
MANOVA: The multivariate analysis of variance is used in the same circumstances as ANOVA, except that you have more than one dependent variable.
Structural Equation Modeling: SEM is used to test complex theoretical models or confirm factor structures of psychological instruments. It can assess relationships among both manifest (observed) and latent (underlying theoretical constructs) variables. For further information, see Vogt et al. (2014).
Nonparametric Statistics: Statistical techniques used when the assumption of normality cannot be met with small samples sizes and with ordinal (rank) or nominal (categorical) data.
Chi-Square: Used with nominal-level data to test the statistical independence of two variables.
Wilcoxon Matched Pairs Signed-Ranks Test: Used with two related samples and ordinal-level data.
Mann-Whitney U Test: Used with two independent samples and ordinal-level data.
Friedman Two-Way Analysis of Variance: Used with more than two related samples and ordinal-level data.
Kruskal-Wallis One-Way Analysis of Variance: Used with more than two independent samples and ordinal-level data.
Descriptive Statistics
Researchers commonly report means and standard deviations for the descriptive statistics portion of their report. The usual format is to first state the mean and then show the standard deviation in parentheses immediately following the mean. Sirindes et al. (2018) used the Iowa Tests of Basic Skills to measure reading achievement. The results were as follows—experimental group: a mean of 138.8 with a standard deviation of 7.5; for Control Group 1: 135.4 (7.2). Sample size is usually indicated by the letter n and in this case, the sample sizes for the experimental and control groups in this study were identical: 3,444 in each group. In the McWhirter et al.’s (2018) study, they reported descriptive statistics (means and standard deviations) for the different measures: discrimination experience 1.80 (.69), educational barriers 1.69 (.48), and school connectedness 3.85 (.72). Their sample size was 819.
Correlational Statistics
McWhirter et al. (2018) wanted to test the strength of the relationship between their predictor variables and thoughts of dropping out of high school. They reported simple correlation coefficients between the variables:
Correlation analyses indicate that students with greater levels of perceived discrimination and educational barriers were more likely to have thoughts of dropping out (r = .20 and r = .22, respectively) and had lower levels of school connectedness (r = −.19 and r = −.31, respectively). Those with higher levels of school connectedness were less likely to have thoughts of dropping out (r = −.32). (p. 335)
The letter r is used to stand for the correlation coefficient statistic. They also chose to use a hierarchical regression technique that allowed them to test the relationship of individual predictor variables in the same statistical analysis. They found that each of the predictor variables contributed to the thoughts of dropping out. However, they also found that students with a higher level of school connectedness, despite their experience with discrimination, were less likely to think about dropping out (beta = .11, SE = .05, p < .01, F = 43.18, p < .001, and R2 = .07).
In English, this parenthetical expression would be read: Beta equals .11, standard error equals .05, and significance level of p is less than .01 for the racial discrimination variable. The F value is a test of the statistical significance of the full model of prediction of thoughts of dropping out when other barriers and school connectedness are considered. In English, this reads: F equals 43.18 and a significance level of p less than .001.
Beta is a standardized regression coefficient obtained by multiplying the regression coefficient by the ratio of the standard deviation of the explanatory variable to the standard deviation of the response variable. Thus, a standardized regression coefficient is one that would result if the explanatory and response variables had been converted to standard z scores prior to the regression analysis. This standardization is done to make the size of beta weights from regression analysis easier to compare for the various explanatory variables.
Researchers use the symbol R2 to indicate the proportion of variance in the response variable (in this case, thoughts of dropping out) explained by the explanatory variable (in this case, experience with racial discrimination) in this multiple regression. F is the statistic used to determine the statistical significance of this result. That is, is the contribution of the explanatory variable to the prediction of the response variable statistically significant? And p is the level of statistical significance associated with F. (Statistical significance is explained in the next section.)
Degrees of freedom indicate the appropriate degrees of freedom for determining the significance of the reported F statistic. F distributions are a family of distributions with two parameters—the degrees of freedom in the numerator of the F statistic (based on the number of predictor variables or groups) and those associated with the denominator (based on the sample size). If you know the number of explanatory variables and the sample size, the computer program will calculate the appropriate degrees of freedom and will use the appropriate sampling distribution to determine the level of statistical significance.
On the basis of these results, McWhirter et al. (2018) conclude that experiences of racial discrimination may increase students’ thoughts of dropping out of high school. They recognize the power of interventions that promote a feeling of connectedness with the school. They hypothesize that discrimination could be reduced by training peers, teachers, and staff to recognize and interrupt overt forms of discrimination. This might also include training to recognize covert discrimination in the form of lower expectations and reduction of referrals for disciplinary issues.
As noted in Chapter 5 on causal comparative and correlational approaches, researchers should not interpret correlation as meaning causation, as a correlational statistic can be calculated between any two variables. If a strong positive correlation was found between shoe size and income, the researcher could not conclude a causal relationship. It would be erroneous to conclude that increasing shoe size would result in higher incomes or that higher incomes would increase shoe size. However, finding a strong positive correlation between two variables does not mean that they are not causally related (e.g., number of cigarettes smoked and incidence of lung cancer). [Please excuse my double negative in the previous sentence; it seems to make the point clearly.]
Statistical Significance
Statistical significance testing can be defined in terms of the calculated probability (p) with possible values between .00 and 1.00 of the sample statistics, given the sample size, and assuming the sample was derived from a population in which the null hypothesis (H0) is exactly true. The null hypothesis is the statement that the groups in the experiment do not differ from one another or that there is no statistically significant relationship between two or more variables. Several important concepts are included in that description: Statistical testing is probability based, sample size influences statistical significance, the sample should be representative of the population, and the probability that is calculated reflects the probability that the null hypothesis can be rejected. A test of statistical significance indicates whether researchers can accept or reject the null hypothesis and the level of confidence they could have in their decision.
When you read in a research report that the results were significant at the .05 level (usually depicted as p < .05), the researcher is telling you that there is a 5% chance that he or she rejected a null hypothesis that was true. In other words, there is a 5% chance that the researcher made a mistake and said there is a statistically significant difference (or relationship) when there really is not. This is called a Type I error. (The converse of this is a Type II error; that is, the researcher fails to reject a false hypothesis.) In the McWhirter et al. (2018) study, the researchers rejected the null hypothesis that there is no relationship between experiences of discrimination and thoughts of dropping out of school. Their hierarchical linear regression results produced a statistical significance level of .001. Thus, the researchers rejected the null hypothesis that no statistically significant relationship existed between experiences of discrimination and thoughts of dropping out of school when other barriers and levels of school connectedness were considered.
The concept of statistical significance is not unproblematic (Greenland et al., 2016). The American Psychological Association (APA) actually considered banning the use of statistical significance testing. However, they revised their position to recommend the use of effect sizes, confidence intervals, and meta-analysis to provide a more accurate picture of effects (Appelbaum et al., 2018). Issues associated with the decision to use a test of statistical significance are discussed in two subsequent sections of this chapter: Interpretation Issues in Quantitative Analysis and Options for Reporting Statistical Results.
Inferential Statistics
Researchers with two groups or one group with two points of measurement can use a t test to compare scores, if the data are continuous. With two groups, you would use an independent t test; with one group and two points of measurement, you would use a t test for dependent samples. For example, Lister-Landman, Domoff, and Dubow (2017) wanted to know if there was a difference between males and females in their levels of compulsive texting. They used an independent t test to compare the means for males and females. They reported: “There was a significant difference between males’ (M = 1.81, SD = 0.54) and females’ (M = 2.18, SD = 0.67) levels of compulsive texting, such that females endorsed significantly higher levels of compulsive texting than did males t(354) = −5.73, p < .01, indicating that gender should be considered in further analyses regarding compulsive texting” (p. 317). (The number 354 is the degrees of freedom based on having 356 participants.)
If researchers want to compare two groups when the data are categorical (e.g., frequency data), then a t test would be inappropriate. However, they can use a chi-square test (χ2) to determine statistically significant differences between groups. For example, Sirinides et al. (2018) wanted to test whether there was a difference between students who were included in their analyses as compared to students who were excluded for a variety of reasons. They conducted chi-square tests on the frequency of students in each group on the basis of race. They found a significant difference (p < .001) indicating that significantly more non-White students were dropped from the analysis than White students.
Researchers have two other nonparametric tests for comparisons of ordinal (rank) data with two groups. In Lindgren, Baigi, Apitzsch, and Bergh’s (2011) study of an exercise program for high school girls, they used the Mann-Whitney U-test for comparisons between groups and the Wilcoxon matched-pairs signed-rank test for comparisons within groups. Huang (2018) used the Mann Whitney U test to compare the pre and post levels of anxiety for social science doctoral students who took a statistics test. He chose this statistical test because his sample size was very small (n = 13). (BTW, there was a significantly lower level of anxiety about statistics for the groups after they had taken the statistics course.)
The definition of ANOVA in Box 13.2 is a bit oversimplified. If researchers have one independent variable with more than two levels (e.g., three approaches to reading instruction), then they would use ANOVA. However, if the researchers had more than one independent variable (e.g., time of test administration and exposure to the experimental treatment), then they would need to conduct a factorial ANOVA. For example, McCarthy et al. (2017) (Sample Study 1) had two independent variables in their study of a program to reduce adolescent depression: the effect of the time that the test was administered and the effect of the exposure to treatment or not. If this was what they wanted to test, they could have done a 4 × 2 factorial ANOVA. The factorial ANOVA allows you to test for main effects (i.e., Is there a significant effect for each of the independent variables?) and for interaction effects. Before drawing conclusions, it is important to also test if there were any interaction effects of the variables (i.e., Did the independent variables vary systematically with each other?). Interpretation of interaction effects is made far easier by graphing the disaggregated results.
Two nonparametric analysis of variance tests can also be used: the Friedman two-way analysis of variance for two related samples with ordinal data and the Kruskal-Wallis one-way analysis of variance with more than two independent samples and ordinal data. For example, Gannon, Becker, and Moreno (2012) studied the effect of religiosity on mentions of sexual behavior on Facebook for college freshman. They determined that the characteristics of interest in their sample were not normally distributed; therefore, they used the Kruskal-Wallis test to analyze references to sexual behavior by religious affiliation and mentions of religiosity on their Facebook page. Their findings included (a) number of sexual references was not different for having or not having a religious affiliation and (b) a significant difference did appear between those who made frequent references to religiosity versus those who did not.
Variations on ANOVA
ANCOVA
It is also possible to have another type of “independent” variable—one that is not of central interest for the researcher but that needs to be measured and accounted for in the analysis. This type of variable is called a covariate. It might be entry-level test scores or socioeconomic indicators for two different groups. In this case, the researcher would use an analysis of covariance (ANCOVA). In the McCarthy et al. (2017) study of adolescent depression, they wanted to control for effects related to age, sex, ethnicity, and family income before they tested the effect of their intervention. These variables were the covariates in their analyses. (To find out which groups had scores that were statistically different from each other, read the section below on post hoc analysis.)
MANOVA
If you have included more than one dependent measure in your design, you may need to use a multivariate analysis of variance (MANOVA). If you are getting into this level of complexity in your analysis, you definitely need to refer to a statistics book. This is the type of analysis that was used by McCarthy et al. (2017) because they had three dependent variables: grades, attendance, and disciplinary outcomes. Therefore, their analysis was quite complex with covariates of age, sex, ethnicity, and family income. As you may recall from Chapter 1, they did not find any statistically significant effects of the intervention when they controlled all these variables.
Post Hoc Analysis
Once you have completed an analysis, such as a three-way ANOVA or a MANOVA, you need to determine where the significant effects are. This can be done using multiple-comparison t tests or other post hoc procedures, such as Tukey’s, Scheffe’s, or Bonferroni post hoc tests. Such post hoc tests allow you to focus on which of several variables exhibit the main effect demonstrated by your initial analysis. McCarthy et al. (2017) reported that post hoc analyses of the effect of the adolescent depression intervention did not reveal any significant effect based on number of out-of-school suspensions.
Choice of Statistic
Table 13.2 can be used as a flowchart to think logically through your statistical choices. For example, in the McWhirter et al. (2018) study, the researchers first wanted to know to what extent the students in their study experienced racial discrimination. This portion of their study is descriptive; therefore, they could go to the first section in Table 13.2 to identify their statistical options. Their scale of measurement is assumed to be interval, so they determined that the mean was the appropriate measure of central tendency, and the standard deviation was useful to describe variability in the data.
Table 13.2 Choice of a Statistical Procedure
Research question: Descriptive
For interval or ratio data:
Mean, median, or mode, and variance
For ordinal data:
Median
For nominal data:
Mode
Research question: Relationship
For two variables
For interval or ratio data:
Pearson product-moment coefficient of correlation
For ordinal data:
Spearman rank order coefficient of correlation or Kendall rank correlation
For interval and nominal or ordinal data:
For interval and artificial dichotomy on an ordinal scale (dichotomy is artificial because there is an underlying continuous distribution):
Point biserial
Biserial
For nominal data:
Contingency coefficient
For more than two variables
For interval or ratio data:
For ordinal data:a
For nominal data:
Multiple regression analysis
Kendall partial rank correlation
Discriminant analysis
Research question: Group differences
For two variables
For related samples
For interval or ratio data:
For ordinal data:
For nominal data:
t test for correlated samplesb
Wilcoxon matched-pairs signed-ranks test
McNemar test for the significance of changes
For independent samples
For interval or ratio data:
For ordinal data:
For nominal data:
t test for independent samples
Mann-Whitney U test or Kolmogorov-Smirnov two-sample test
Chi-square test
For more than two variables
For related samples
For interval or ratio data:
For ordinal data:
For nominal data:
Repeated measures ANOVA
Friedman two-way analysis of variance
Cochran Q test
For independent samples
For interval or ratio data:
For ordinal data:
For nominal data:
ANOVA
Kruskal-Wallis one-way ANOVA
Chi-square test for k independent samples
Research question: Prediction of group membership
For all data:
Discriminant functionc
Research question: Structure of variables
For interval or ratio data:
Factor analysis
aOrdinal and nominal data can be used in multiple regression equations through a process called “dummying-up” a variable. Refer to one of the statistical texts cited at the beginning of this chapter for more details on this procedure.
bAll t tests and variations on ANOVA require that the data satisfy the assumptions for parametric statistical procedures.
cDiscriminant functions can be one way, hierarchical, or factorial, depending on the number of independent and dependent variables.
McWhirter et al. (2018) had an additional research question: What was the relationship between students’ experiences of discrimination and their thoughts about dropping out of school? Therefore, they could go to the second section in Table 13.2 because their research question was one of relationships. They have several blocks of predictor variables (background [e.g., gender], racial identity variables, and racial discrimination), with interval data, so they chose to conduct hierarchical regression analysis.
Assumptions for Parametric Statistics
As mentioned in Table 13.2, it is important for you to be aware of the assumptions that underlie the use of parametric statistics. These include (a) normal distribution of the characteristic of interest in the population, (b) randomization for sample selection or group assignment (experimental vs. control), and (c) an interval or ratio level of measurement. The assumption that the population is normal rules out outliers in your data, so the presence of outliers shows that this assumption is not valid. Also, if the distribution of the characteristic in the population is skewed (i.e., bunched up at one end of the continuum or the other), the assumption of normality is not met. In the case of skewed distribution, it may be possible to transform the data to approximate a normal distribution using a logarithmic transformation (Field, 2017). If the assumptions cannot be met, you need to consider alternative data analysis strategies. That is where the choice of nonparametric statistics becomes attractive (sometimes called distribution-free inference procedures).2
Interpretation Issues in Quantitative Analysis
A number of challenges are presented for quantitative researchers for the interpretation of the results of their data analysis:
The influence of (or lack of) randomization on statistical choices and interpretation of results
The analytic implications of using intact groups
The influence of sample size on achieving statistical significance
Statistical versus practical significance
Issues related to cultural bias
Variables related to generalizability
Following a discussion of these challenges, I present options for responding to some of them, such as reporting effect sizes and amount of variance accounted for, the use of confidence intervals, replication, use of nonparametric statistics, exploration of competing explanations, recognition of a study’s limitations, and principled discovery strategies (Mark, 2009).
Randomization
Early statisticians (e.g., R. A. Fisher, 1890–1962, and William Sealy Gosset, 1876–1937, inventor of the statistical test called “Student’s t”) based their work on the assumption that randomization is a necessary condition for the use of typical tests of parametric statistics. Randomness can be achieved by either random sampling or random assignment to conditions. Random sampling has to do with how the participants were chosen from the population (see Chapter 11). Random assignment has to do with how participants were assigned to levels of the independent variable so that variability between groups is statistically evened out (see Chapter 4). Random sampling is a very difficult condition to meet in most educational and psychological research, and random assignment is not always possible.
Researchers can have difficulty recruiting schools to be in their study, thus making it impossible to select schools or students at random. In the end, they may be able to assign schools that volunteered to either the experimental or control conditions, even if they cannot randomly assign individual students. They might also be able to randomly assign the treatment conditions by classrooms (not by individual student).
In many situations, it is not possible for ethical or practical reasons to assign people randomly, and it may not be possible to randomly select individuals from a larger population. Much research in education and psychology is done with available populations, and therefore the use of parametric statistics is questionable. If intact classes are used, the class becomes the unit of analysis, thus necessitating either having a large number of classes involved in the research to conduct meaningful statistical analysis or the use of more sophisticated statistical strategies. In studies with intact classes or groups, researchers can choose to use a regression analysis rather than ANOVA. In regression analysis, there is no need to create small groups based on collapsing scores on variables. Thus, this approach can provide a more desirable option because it would not require expanding the sample size.
Sample size
Sample size is a basic influence on statistical significance (Hubbard & Lindsay, 2008). Virtually any study can have statistically significant results if a large enough sample size is used. For example, with a standard deviation of 10 and a sample size of 20, a difference of 9.4 between two independent means is necessary for statistical significance at the .05 level in a nondirectional test. However, if the sample size is 100, a difference of only 4.0 is required, and with a sample size of 1,000, a difference of only 1.2 is required (Shaver, 1992). An overly large sample size can result in obtaining statistical significance, even though the results may have little practical significance (see the next paragraph for further elaboration of this idea). When researchers are working with low-incidence populations, as commonly happens in special education research, the small sample size itself might prevent the researcher from obtaining statistical significance. Small sample sizes also have implications for the researcher’s ability to disaggregate results by characteristics, such as gender, race or ethnicity, or type of disability. In such cases, the researcher needs to plan a sample of sufficient size to make the disaggregation meaningful. With power analysis (discussed in Chapter 11), a researcher can determine the size of sample needed in order to obtain a statistically significant result. sample Size
Statistical Versus Practical Significance
The influence of the size of the sample on the ease or difficulty of finding statistical significance brings up the issue of statistical versus practical significance (Greenland et al., 2016). Simply because it is easier to obtain statistical significance with larger samples, researchers need to be sensitive to the practical significance of their results. For example, statistical significance may be obtained in a study that compares two drug abuse treatment interventions. In examining the size of the difference, the data may indicate that there are only 2 days longer abstinence for the experimental group. Thus, the researcher needs to be aware of the practical significance of the results, particularly if there are big differences in the costs of the two programs. Is it worth changing to a much more expensive program to keep someone off drugs for 2 additional days?
Ziliak and McCloskey (2007) use a simple comparison to illustrate the difference between statistical and practical significance:
Crossing frantically a busy street to save your child from certain death is a good gamble. Crossing frantically to get another mustard packet for your hot dog is not. The size of the potential loss if you don’t hurry to save your child is larger, most will agree, than the potential loss if you don’t get the mustard. [Researchers] look only for a probability of success in the crossing—the existence of a probability of success better than .99 or .95 or .90, and this within the restricted frame of sampling—ignoring in any spiritual or financial currency the value of the prize and the expected cost of pursuing it. In the life and human sciences, a majority of scientists look at the world with what we have dubbed “the sizeless stare of statistical significance.” (p. vii)
Cultural Bias
As discussed in Chapter 11 on sampling, use of a label to indicate race when investigating the effects of various social programs can do an injustice in terms of who is included in the study as well as how the results are interpreted and used. Williams (2015) addresses the injustices when stereotypic beliefs based on skin color or other phenotypic characteristics serve as a basis for cultural bias in the research process, including the analysis, interpretation, and use of data. Rather than relying on an overly simplistic category such as race for an explanatory variable to categorize people uncritically and assume homogeneity in their conditions, they argue that researchers need to address the complexity of participants’ experiences and social locations. Random selection and assignment cannot make up for cultural bias. When a researcher explores differences based on race, they need to be aware of the many social factors that are obscured when race itself is used as a variable. To what extent is the racial category “White” used to represent the norm so that behavior for other racial groups is viewed as inferior? How does the interpretation of data on the basis of racial differences lead to ignoring the social conditions that are responsible for unequal chances for high quality education or meaningful work or living in a safe environment?
However, researchers who critically examine race as a distinct cultural pattern realize that interpretations must be tied to the socioeconomic and environmental context. Thus, policy and program decisions based on this perspective can be more responsive to the social and cultural settings of programs. This brings us full circle from the perspectives discussed in Chapter 1. If the researcher starts from a deficit view of the minority population, the interpretation of the data will focus on the dysfunctions found within that community. If the researcher starts with a transformative perspective, the researcher will attempt to identify the broader cultural context within which the population functions. For example, African Americans are at greater risk than other populations for living in poverty. These circumstances are important because of the relationship between poverty and oppressive social experiences. Rather than focusing on the “deficit” of being raised in a single-parent household, researchers need to understand the notion of extended family and its particular importance for African American children. They also need to understand the contextual information about the experiences and characteristics of racial communities concerning the environments in which children live that make them less likely to be in a position to be in a position of privilege with regard to access to social and educational opportunities.
Generalizability
As mentioned in previous chapters, external validity is defined in terms of the generalizability of the results of one particular research study to a broader population. Randomized sampling strategies are supposed to ensure that the results can be generalized back to the population from which the sample was drawn. Randomized sampling is not always possible, and therefore researchers need to be careful in the generalizations they make based on their results. When working with racial and ethnic minority groups or people with disabilities, generalizations about group differences are often put forward without much attention to within-group variation and the influence of particular contexts. Bledsoe (2008) describes this problem within the African American community in that many social prevention programs have been conceived from dominant middle-class perspectives, and many of these programs have been implemented in African American communities.
Although researchers sometimes acknowledge that culturally specific approaches are needed, there have been few serious efforts to design and evaluate programs based on culturally diverse perspectives. A notable exception is the work of Hood, Hopson, and Frierson (2015; Hood, Hopson, & Kirkhart, 2015) in culturally responsive evaluations. When researchers use inappropriate measurement strategies with minority groups, they can erroneously reach conclusions that the programs are or are not effective. The people who are hurt by such inappropriate conclusions are those who have the least power. J. E. Davis (1992) instructs researchers about the damage that can be done by relying completely on the results of data analyses without consideration of the contextual knowledge about the community and the participants who live there. He summarizes the problems with reliance on comparative statistical outcomes without proper sensitivity to contextual variables:
An enormous amount of information about the location and contexts of programs is missing from the discussion of programs’ causal claims. Often, knowledge of a program’s clientele and the program’s appropriateness for its environment is needed to advance thinking about the program’s causal assertions. Unfortunately for African Americans and other U.S. racial minorities, this information is, at best, partially known but discarded or, at worst, not known or even cared about. This is not to say that experimental and quasi-experimental studies are not useful for program evaluation; to the contrary. These methods are very powerful in revealing program effects, but results must be examined more carefully, and with sensitivity to diverse populations. (p. 63)
He warns that making the assumption that all African Americans are homogeneous in comparative studies leads to a lack of understanding of program effects for the diverse members of this community. More information is needed about relevant dimensions of diversity within African American communities. He concludes: “African Americans are the largest racial minority in this country, but much within-group variation and in-depth understanding will be completely lost with traditional race-comparative analysis in program evaluation” (p. 63).
Options for Reporting Statistical Results
The APA recommendations concerning reform in the use of tests of statistical significance included reporting effect sizes, confidence intervals, and using graphics (Appelbaum et al, 2018). These were not entirely new recommendations, as researchers have offered a number of options in the past for reporting statistical results of quantitative research that include effect sizes, percentage of variance accounted for, and examining within- and between-group differences.ns for Reporting Statistical Results
Effect Size
For studies that have experimental and control groups, Vogt et al. (2014) provides the following explanation of effect size: Effect size is calculated as a percentage of the standard deviation of the outcome measures (also known as Cohen’s d), to facilitate the comparison of findings across outcomes that are measured on different scales. An effect size is a way of representing the size of a treatment’s effect in standardized units that then allows for comparisons across studies that might use different outcome measures. An effect size can be calculated to capture the difference between the means for experimental and control groups, calculating the distance between the two group means in terms of their common standard deviation. Thus, an effect size of 0.5 means that the two means are separated by one half of a standard deviation. This is a way of describing how well the average student or client who received the treatment performed relative to the average student or client who did not receive the treatment. For example, if an experimental group of persons with behavioral problems received a drug treatment and the control group received a placebo, an effect size of 0.8 would indicate that the experimental group’s mean was 0.8 standard deviation above the control group.
An increasing number of journals in education and psychology are now requiring that researchers include effect size in their submissions. In relation to effect size reporting, APA (Appelbaum et al., 2018) recommends that researchers always include an effect size when reporting statistical significance. In Chapter 3, I described how to use effect sizes for meta-analysis—that is, a statistical synthesis of previously conducted research. Researchers should not blindly interpret the effect size based on magnitude and suggests that the judgment of significance rests with the researcher’s, user’s, and reviewer’s personal value systems; the research questions posed; societal concerns; and the design of a particular study. For more detailed information on effect size, the reader is referred to Greenland et al (2016) or Vogt et al. (2014).
Confidence Intervals
Confidence intervals are used to indicate the degree of confidence that the data reflect for the population mean or some other population parameter (Greenland et al., 2016). Confidence intervals are frequently seen in mainstream media in prediction of election outcomes in the form of a percentage of people who agree or disagree on a candidate, plus or minus a certain percentage to indicate the range of values and the level of confidence that the range includes the population parameter. Because of sampling error, researchers expect the mean to vary somewhat from sample to sample. Most commonly used statistical software packages will compute confidence intervals. APA’s manual states, “Because confidence intervals combine information on location and precision and can often be directly used to infer significance levels, they are, in general, the best reporting strategy. The use of confidence intervals is therefore strongly recommended” (APA, 2009, p. 34).
Correlational Research and Variance Accounted For
Correlational results can be interpreted in different ways. For example, a researcher could say that men and women scored significantly differently (p < .01) on a scale. Or they could say that many variables were tested to determine their relationship with performance on a test, gender being one of those variables. If gender accounts for only 3% of the variance and other predictor variables such as training or experience account for 73% of the variance, then you would interpret the results very differently. The amount of variance accounted for by such background characteristics as race and gender is powerful information; additional manipulable variables (such as training and experience) can provide a clearer picture of what accounts for differential performance on dependent measures.
Replication
When the data do not meet the assumptions necessary for reaching conclusions about statistical significance, APA (2009) recommends that researchers replicate the study’s results as the best replacement for information about statistical significance. Building replication into research helps eliminate chance or sampling error as a threat to the internal validity of the results. This also emphasizes the importance of the literature review, discussed in Chapter 3, as a means of providing support for the generalizability of the results. However, Mark and Gamble (2009) note that replication is not always or even often a viable option:
The false confession problem also would be solved if the experimenter could snoop in the data from one study and then, if an important finding emerges (e.g., a program benefits one subgroup and harms another), see if the pattern replicates in another study. Unfortunately, options for replication are limited in many areas of applied social research. For example, some program evaluations take years and cost millions of dollars; so replication is an infeasible strategy relative to the timeline for decision making. (p. 210)
Use of Nonparametric Statistics
Nonparametric statistics provide an alternative for researchers when their data do not meet the basic assumptions of normality and randomization, they have small samples, or they use data of an ordinal or nominal nature. We saw this in the Huang (2018) study of statistical anxiety described earlier in this chapter.
Competing Explanations
No matter what design, paradigm, or type of data collected, researchers always need to consider competing explanations. (In the postpositivist paradigm, these are called threats to internal and external validity.) Such competing explanations become critical when it is time to interpret the results. Lister-Landman et al. (2017) explored reasons females might engage in compulsive texting more than males, such as a preoccupation with interpersonal relationships. “Perhaps engaging in compulsive texting reflects females’ preoccupation with intimacy in relationships that interferes with academic tasks (e.g., homework, studying) to the extent of impairing academic adjustment” (p. 322).
McCarthy et al.’s (2017) study of the effect of an intervention to address adolescent depression showed no significant effects. They noted that students who reported a decrease in their depressive symptoms in both the experimental and control groups fared better academically. They suggested the following competing explanation:
It does not suggest that depression prevention programs are the panacea for school performance and functioning among adolescents at risk of depression … Given the inconsistency of findings, depression prevention researchers should continue to examine program effects on objective measures of school performance and functioning. Positive findings may provide an important means of advocating for the provision of depression prevention programs in schools. However, if additional research on these programs continues to produce similar results on school-related variables, it will suggest that they are too brief and targeted to impact these secondary outcomes. (p. 10)
Recognizing Limitations
As should be clear by now, it is not possible to design and conduct the “perfect” research study in education or psychology. Therefore, it is incumbent on the researcher to recognize and discuss the limitations of a study. Reasons for a study’s limitations are many, such as characteristics of the sample (e.g., all middle-class Caucasian students), measurement instruments (using grades as academic outcomes because grades can vary across teachers and schools), sample size (too small), or not including important variables in the study. For example, Lister-Landman et al. (2017) acknowledged that they had not investigated students’ motivation for texting, thus they were unable to explore the possibility that females’ texts are more associated with preoccupation with interpersonal relationships than males. In Sirinides et al.’s (2018) study of Reading Recovery the authors noted that “A final limitation of this study is its inability to explain substantial variation in program effect that were observed across schools” (p.16).
Principled Discovery
Mark and Gamble (2009) propose that a strategy called principled discovery offers promise for addressing many of the challenges associated with statistical significance testing:
In short, the idea is to complement traditional analyses of experimental and quasi-experimental research with iterative analysis procedures, involving both data and conceptual analysis. The goal is to discover complexities, such as unpredicted differential effects across subgroups, while not being misled by chance. (p. 210)
Remember that statistics are probability based and that conducting multiple analyses with the same data set increases the chance of finding something significant. “Principled discovery has been offered as a way to try to discover more from the data while not being misled by false confessions” (p. 210). Mark and Gamble suggest that principled discovery begins with testing a prior hypothesis, such as the new treatment will result in improved performance compared to the old treatment.
The principled discovery that follows involves two primary stages (which may further iterate):
In the first stage, the researcher carries out exploratory analyses. For example, an experimental program evaluator might examine whether the program has differential effects by looking for interaction effects using one after another of the variables on which participants have been measured (e.g., gender, race, age, family composition). A wide variety of statistical techniques can be used for the exploratory analyses of this first stage of principled discovery (Mark, 2003; Mark et al., 1998).
If the Stage 1, exploratory analyses result in an interesting (and unpredicted) finding (and if replication in another study is infeasible), then in the second stage of principled discovery the researcher would seek one or another form of independent (or quasi-independent) confirmation of the discovery. In many instances, this will involve other tests that can be carried out within the same data set (although data might be drawn from other data sets, or new data might be collected after Stage 1). For example, if a gender effect is discovered in an educational intervention, this might lead to a more specific prediction that boys and girls will differ more after transition to middle school than before. As this example illustrates, Stage 2 of principled discovery includes conceptual analysis as well as data analysis. That is, the second stage of principled discovery will generally require an interpretation of the finding from the Stage 1 exploration. (Mark & Gamble, 2009, pp. 210–211)
Mark and Gamble (2009) explain that this two-stage process protects against tortured data confessing falsely because, if the effect observed in Stage 1 is purely due to chance, then there is no expectation that it will occur again in Stage 2:
For example, if a gender effect from Stage 1 had arisen solely due to chance, it would be unlikely that the size of the gender difference would be affected by the transition to middle school. Principled discovery has considerable potential for enhancing emergent discovery in quasi-experimental (and experimental) research, while reducing the likelihood of being misled by chance findings. Despite its potential benefits, important practical limits will often apply to principled discovery. These include the possibility that in some studies data elements will not be available for the second stage of principled discovery, as well as the likelihood that statistical power may be inadequate for some contrasts (Mark, 2009). Despite such limits, principled discovery appears to be an approach that can help address some of the practical and ethical objections to randomized and quasi-experimental studies of the effects of interventions in a complex world. (Mark & Gamble, 2009, p. 211)
Extending Your Thinking
Statistical Analysis
How can sample size influence statistical significance? Why is this particularly important in special education, general education, and psychological research?
Why is randomization an important consideration in the choice of a statistical test? Why is this particularly important for research that uses small, heterogeneous, or culturally diverse samples?
What can a researcher do when the basic assumptions for parametric inferential statistics are not met?
Qualitative Analytic Strategies
As mentioned before but repeated here for emphasis, data analysis in qualitative studies is an ongoing process. It does not occur only at the end of the study, as is typical in most quantitative studies. The fact that the topic is explored in depth here is simply an artifact of the way the human brain works. It is not possible to learn about everything all at once. So realize that analysis in qualitative studies designed within the ethnographic or phenomenological traditions is recursive; findings are generated and systematically built as successive pieces of data are gathered (Thornberg & Charmaz, 2014).
Qualitative data analysis has sometimes been portrayed as a somewhat mysterious process in which the findings gradually “emerge” from the data through some type of mystical relationship between the researcher and the sources of data. Anyone who has conducted in-depth qualitative analysis will testify that a considerable amount of work occurs during the data collection and analysis phases of a qualitative study.
Several analytic strategies can be used in qualitative data analysis, including grounded theory, narrative, and discourse analysis (Wertz et al., 2011). In addition, Miles, Huberman, and Saldana (2014) prepared a sourcebook that can be used to guide the novice researcher through the process of qualitative data analysis that involves data reduction through the use of various kinds of matrices. Researchers can also choose to analyze data the old-fashioned way (cut and paste pieces of paper) or by using one of the many computer-based analysis programs that have grown in popularity over the years. More details about these options are presented later in this chapter.
Steps in Qualitative Data Analysis
Bazeley (2013) provides a step-by-step description of data analysis strategies for qualitative data, despite acknowledgment that undertaking such a task sets up an oxymoronic condition in that qualitative data analysis in essence defies a step-by-step approach. Yet as mere mortals, we are faced with the task of starting somewhere; thus, we begin by taking a look at our data as it is collected.
Step 1: Preparing the Data for Analysis
This step assumes that the researcher has been reviewing and reflecting on the data as it is collected. How this is done depends to some degree on the type of data collected and the method of collecting and recording the data. For example, if the researcher used video or audio taping, then questions arise about the transcription of the data: Should all the data be transcribed? If not, how are decisions made about which parts should be transcribed and which should not? How will the researcher handle nonverbal behaviors or elements of the interviews such as laughter or pauses, emotions, gestures? Should the researchers themselves do the transcription? I advise researchers to undertake the process of transcription themselves because this is part of the data analysis process engendered by interacting with the data in an intensive and intimate way. If someone else does the transcription, then the researcher should listen to the audio or video with the transcripts to build this intimate relationship with the data and to insure accuracy. As the questions raised in this paragraph illustrate, transcription is not a transparent process. Researchers bring their own point of view to the process, including noting multiple meanings that lie in what might appear to be simple utterances.
While this higher level of thinking is happening, researchers should also take care of practical considerations, such as being sure their notes and files of data are well labeled and organized to facilitate data analysis processes and accurate reporting of the results.
Steps 2 and 3: Data Exploration Phase and Data Reduction Phase
These two phases are synergistic: As you explore your data, you will be thinking of ways to reduce it to a manageable size that can be used for reporting. Exploring means reading and thinking and making notes about your thoughts (called “memoing” by the qualitative research community). Memos can take the form of questions about meanings, graphic depictions of how you think data relate to each other, or important quotes that you want to be sure you don’t lose track of during the analysis process. The data reduction occurs as you select parts of the data for coding—that is, assigning a label to excerpts of data that conceptually “hang together.” Saldana (2013) provides a handy manual to guide qualitative researchers in their coding process. Box 13.3 contains an example of codes used in a study, along with an excerpt from the codebook. The first few lines of this example provide the information about the source of the data. The comments on the side are the codes used to identify portions of text that exemplify relevant concepts.
Box 13.3
Codes and Codebook Excerpts
Field Notes by Donna Mertens, evaluator, Project Success Teachers Reflective Seminar
May 11, 2007, 9AM–noon
Facilitator (F) introduced the evaluation team and their purpose.
F: Divide into pairs and discuss your life experiences. Then we’ll come back together and each of you will describe one “WOW” experience and one challenging experience.
Observation: the 2 current students did not interact; F talked with one teacher. Four African American teachers; 6 white female teachers; 1 white female grad student.
WOW and Challenges:
T5: My students are under 5 years old and they come with zero language and their behavior is awful. They can’t sit for even a minute. Kids come with temper tantrums and running out of the school building. I have to teach these kids language; I see them start to learn to behave and interact with others. My biggest challenge is seeing three kids run out of school at the same time. Which one do I run after? One kid got into the storm drain. I’m only one teacher and I have an assistant, but that means there is still one kid we can’t chase after at the same time as the other two.
T7: I had my kids write a letter to their teachers to express their appreciation. That was a WOW experience. Challenge—I don’t like to be negative. My real challenge is that I feel kind of isolated because of being in a separate building—other teachers don’t even know who I am.
T6: I had a student from a Spanish speaking country. He struggled to pick up ASL. Once he starting picking it up, he was very quick. I noticed this student is now asking for more challenge; he participated in his IEP meetings. Now he is going to general education classes; I can’t remember any kids from special ed going to general ed before.
Challenge: I feel teachers in the mainstream resist our students, especially students with multiple disabilities.
T8: I teach students who are mainstreamed for 2 or 3 classes; the school sees my students as being very low, but my students want to know things.
T3: My WOW and challenge are really the same thing. When I graduated, I thought I was ready to teach. Then the principal gave me my list of students and my classroom and just washed his hands of me. You’re on your own. The principal did not require me to submit weekly plans like other teachers because he thought I’d only be teaching sign language. But I told him, I’m here to really teach. We (my students and I) were not invited to field day or assemblies. That first year really hit me—what a challenge and a WOW at the same time. So I changed schools and this one is definitely better. Now I’m in a school where people believe that deaf students can learn.
T4: I have 6 students in my classroom. There are no other teachers there to help me. My class has kids who use 3 different methods of communication: sign, oral, and cued speech. I tried to explain in sign, but the other kids don’t understand. I was always saying: what should I do? I have a co-teacher in the afternoon, but she doesn’t really support me. So I told her I needed her to really help me. So she works with one kid who got a cochlear implant. He can say his name now, but nothing else.
Codebook Excerpts:
ASL—American Sign Language
ESL—English as a Second Language
ChgScSys—Changed school systems
DivBhvr—Diverse Behavior—Teachers’ experience challenges because of diverse student behavior. Includes behavioral issues.
DivComm—Diverse communication modes—combination of communication modes in classroom—sign, oral, CI [cochlear implant], cued speech.
Iso—Teacher feels isolated.
Low—Low expectations for students.
NoSup—No support system.
WOW—Something wonderful that happened to the teachers.
SOURCE: Field Notes prepared by Mertens for the Project SUCCESS evaluation (Mertens, Holmes, Harris, & Brandt, 2007).
DivBhvr
WOW
Iso
ESL, ASL, DivComm
Low
Low, ChgScSys, NoSup
Thornberg and Charmaz (2014) provide a detailed description of a two-phase coding strategy that is used within the grounded theory method of data analysis: initial coding and focused coding. (Grounded theory is not the only analytic strategy for qualitative data; however, it does reflect many of the characteristics of qualitative data analysis that are common across other approaches.) Corbin and Strauss (2008) use the terms open coding and axial coding instead of Charmaz’s initial and focused coding. (See Box 13.4 for brief descriptions of three other data analysis approaches: narrative, phenomenological, and discourse analysis.) In the initial coding phase, the researcher codes individual words, lines, segments, and incidents. The focused coding phase involves testing the initial codes against the more extensive body of data to determine how resilient the codes are in the bigger picture that emerges from the analysis. The development of codes can be used to form the analytic framework needed for theory construction.
Miles, Huberman, and Saldana (2014) identify three elemental methods for coding: descriptive coding, in vivo coding, and process coding. Descriptive codes consist of a word or short phrase that summarizes the topic of a short passage of text. In vivo coding consists of words taken directly from the transcript—that is, codes based on the participant’s own words. Process codes denote an action that is reflected in the data and are generally in the form of a gerund (i.e., they are verbs that end in “ing”). They identify 25 different types of coding that reflect emotions, evaluation, drama, exploration, procedural, and grammatical aspects of the data. DivComm
Thornberg and Charmaz (2014) give this advice for the initial coding phase:
Remain open
Stay close to the data
Keep your codes simple and precise
Construct short codes
Preserve actions
Compare data with data
Move quickly through the data
Box 13.4Narrative and Discourse Analysis Approaches
Narrative Analysis
Narrative analysis focuses on the stories, whether they are told verbally or in text or performance formats (Esin, Fathi, & Squire, 2014). The researcher tries to identify the content, structure, and form of life stories based on the available data. This might take the form of a biography or autobiography, a timeline of important life events, or an exploration of the meaning of life events within a broader sociocultural context. Narratives can be analyzed as separate voices to reveal diversity of perspectives of an issue. As a researcher approaches narrative analysis, it is important to consider whose story will be told and how the person’s story will be represented. Arts-based researchers (Archibald & Gerber, 2018) have contributed to the analytic processes in narrative analysis and extended this to include analysis of such genres as poetry, music, dance, and the visual arts.
Phenomenological Analysis
Phenomenology is focused on comprehending the essence of the participants’ experiences. There are a number of approaches for phenomenological analysis; however, they are all focused on disclosing of phenomenon in consciousness (Willis, Sullivan-Bolyai, Knafl, & Cohen, 2016). A unique characteristic of phenomenological analysis is the use of bracketing to aid researchers in suspending their beliefs—that is, “to lay aside what is known to apprehend lived experience with fresh eyes without predetermined judgements, biases, and answers” (p. 1189) so that they can allow the disclosure of the unique lived experiences of the participant. Bracketing means “holding one’s ideas in abeyance in phenomolgocial research” (p. 1189). This process allows the researcher to identify their own views of the phenomenon before they engage with the participant and the resulting data. Willis et al. (2016) described the phenomenological analysis strategies they used in a study designed to understand the lived experience of being bullied for middle school students. They describe bracketing their theories related to anxiety, psycholocial developmental frameworks, and personal experiences with the phenomenon. The analysis included identifying meaning units that reflected the lived experiences of the participants and developed a web of relationships amongst the meaning units.
Discourse Analysis
Discourse analysis focuses on understanding how language constructs and mediates psychological and social realities (Willig, 2014). Discourse analysts pay particular attention to language and choice of words used to express a thought, feeling, or experience. Different approaches to discourse analysis emphasize different aspects of the meaning of language. For example, some focus on the role of language in maintaining power relations, while others are interested in the role of everyday conversations and their role in defining social worlds.
The basic analytic process involves examining three dimensions (Fairclough, 2003):
Analysis of the text, which involves the study of language structures such as use of verbs; use of statement, questions, or declarations; and the thematic structure
The analysis of discursive practice, which involves how people produce, interpret, disseminate, and consume text
The analysis of sociocultural practice, which involves issues of power in the discourse context and its implications in wider society
Brown, Bloome, Morris, Power-Carter, and Willis (2017) provide a review of research that used discourse analysis to study classroom conversations in the study of race and their role in the disruption of social and educational inequalities.
From my experience, I would add, involve team and community members in the coding when appropriate; discuss differences in interpretation and use of the codes (Mertens, 2009). Allow the codes to emerge and be revised as necessary, especially in the early stages of coding. Explore differences in interpretations; this can be an opportunity for surprising discoveries. Make a codebook that includes brief descriptions of each code. Having such a codebook facilitates the employment of a constant comparative method of analysis. Corbin and Strauss (2008) provide this definition of constant comparative analysis:
Comparing incident against incident for similarities and differences. Incidents that are found to be conceptually similar to previously coded incidents are given the same conceptual label and put under the same code. Each new incident that is coded under a code adds to the general properties and dimensions of that code, elaborating it and bringing in variation. (p. 195)
This is the bridge to the focused phase of coding that Thornberg and Charmaz (2014) describe in focused coding as the means to sift through large amounts of data and the many codes produced in the initial coding phase. The researcher needs to make decisions about which of the initial codes capture and synthesize the main themes present in the data. If a researcher is using a grounded theory approach, the focused coding will lead to identifying relations among the coding categories and organizing them into a theoretical framework. You validate the hypothesized relationships with the data available to you and fill in categories that need further refinement and development. This step is integrative and relational; however, Corbin and Strauss (2008) note that the analysis that occurs at this stage of the study is done at a higher, more abstract level. During this phase of analysis, the researcher identifies the core category or story line and then relates the subsidiary categories to the core through a model. (Corbin and Strauss use the term paradigm; however, I use the term model because paradigm has a different meaning in this book.) The model includes an explication of the conditions, context, strategies, and consequences identified in the coding phase. You then validate your theory by grounding it in the data; if necessary, you seek additional data to test the theory.
The coding example shown previously in this chapter of the study of teachers who work with students who are Deaf and have an additional disability illustrates how the transformative lens was brought into the analysis to support the focused coding phase (Mertens, Holmes, Harris, & Brandt, 2007). We were able to see connections between social justice issues in teacher preparation and support in early years of teaching based on the challenges that new teachers faced in terms of being marginalized at their schools (manifest by low expectations from administrators and exclusion from mainstream school activities) and a need to fully address language diversity (e.g., home languages other than English; use of a variety of languages and communication modes such as American Sign Language, speaking English while using ASL signs, and cued speech; use of a variety of assistive listening technologies such as cochlear implants and hearing aids). If all students have a right to an education, how can colleges of education prepare teachers who can respond appropriately to these educational challenge?
Theoretical Lenses and Qualitative Data Analysis
In Chapter 8, you read about various theoretical lenses that are associated with the transformative paradigm and that have been applied in qualitative research, such as feminist, critical race, LatCrit, disability rights, deafness rights, and critical theory. You will recall that use of such a theoretical lens influences the kind of research questions and data that you collect. So it is logical to expect that analytic strategies are also influenced by the use of such theoretical lenses. Milner (2012) describes the use of narrative and counter-narrative analytic tools in order to challenge oppressive belief systems that represent dominant views. In Milner’s study, he was challenging negative stereotypes of Black teachers by analyzing the counter-narratives of the Black teacher in his study. Using critical race theory (CRT), he used counter-narratives
to share a teacher’s experiences in ways that have not necessarily been told because it provides a different picture into the complexities of teaching and learning…. Race and racism are placed at the center through the narrative and counter-narrative through a critical race theory framework of analysis. (p. 28)
Using counter-narrative analytic strategies, Milner was able to contrast stereotypes, such as Black teachers are too harsh and authoritarian, with the teacher’s explanation that she accepted her responsibility for the development of her students, not just academically but holistically. She kept students engaged throughout class time and accepted no nonsense; she prioritized students’ learning that they are responsible members of their community with an obligation to serve and improve their community.
Additional examples of counter-narrative analysis are available in Caton’s (2012) study of Black males’ experiences with zero-tolerance policies that led to their dropping out of high school. Zero-tolerance policies were instituted to ensure a safe environment in schools. However, the policies disproportionately result in Black males being expelled from or dropping out of high school. The use of counter-narratives allowed Caton to “challenge the contemporary ideology of color blindness and the notion of a school-to-prison pipeline…. This framework could illuminate what it means to be Black and male in an urban school, pursuing an education under a burden of suspicion” (p. 1063). Ellison (2017) also provides an example of counter-narrative analysis in her study of urban African American mothers’ experiences with the common core state standards using a CRT theoretical framework.
The use of a feminist theoretical lens provides another example of how theory influences analysis. Opara (2018) used Black feminist theory in her study of the lived experiences of African American female adolescents and their susceptibility to HIV/AIDS and STD. This lens led to consideration of the historical struggle that Black women have faced because of multiple sources of oppression (race, ethnicity, gender, sexual orientation, and class). The study also considered the intersectionality of these sources of oppression as a means to identifying strategies that could be effective to reduce their health risks.
Using Computers in Qualitative Analysis
Because qualitative studies tend to result in mountains of data (literally), many researchers have turned to computerized systems for storing, analyzing, and retrieving information. Presently, a large number of computer programs are available (e.g., ATLAS/ti, The Ethnograph, HyperRESEARCH, MAXQDA, and NVivo), and this is an area in which rapid changes are occurring. Hence, I will not recommend any particular software but instead refer you to a website that I have found to be useful called Computer Assisted Qualitative Data Analysis that is maintained by the University of Surrey where you will find analyses of various programs as well as breaking news in this area (https://www.surrey.ac.uk/computer-assisted-qualitative-data-analysis/resources). There is also a special issue of The Qualitative Report (https://nsuworks.nova.edu/tqr/vol23/iss13/1/) on the future of qualitative data analysis software that contains many articles that reflect on the advantages and disadvantages of different software packages.
Before making a decision about which software program to use, you should review the resources cited in this chapter (as well as any more recent developments in this rapidly changing field). You need to pick a system that is compatible with your hardware as well as with your research purposes. One caution: No matter how attractive the software, nothing should separate you from active involvement with your data. Qualitative data analysis is really about you thinking about your data and hypothesizing possible relationships and meanings. A computer can be an important aid in this process, but you should not let it become a means of separating you from the process of knowing what your data have to say.
Interpretation Issues in Qualitative Data Analysis
Triangulating Data
Triangulation, as it was discussed in Chapter 8, involves the use of multiple methods and multiple data sources to support the strength of interpretations and conclusions in qualitative research. As Guba and Lincoln (1989) note, triangulation should not be used to gloss over legitimate differences in interpretations of data; this is an inaccurate interpretation of the meaning of triangulation. Such diversity should be preserved in the report so that the “voices” of the least empowered are not lost. Richardson and St. Pierre (2005) suggest that a better metaphor for this concept is crystallization; Mertens (2009) suggested the metaphor of a prism. The crystal and the prism metaphors suggest multifaceted sources of data that are brought to bear on the interpretation of findings.
Audits
Two types of audits were described in Chapter 8: the dependability audit and the confirmability audit. Through the use of these two strategies, the researcher can document the changes that occurred during the research and the supporting data for interpretations and conclusions. The process of memoing discussed previously in the coding section has been noted to contribute to a researcher’s ability to make visible the decision-making trail that occurred during the course of the study. Memos can serve as an audit trail to document the progression of the study as well as changes that occurred and the context for those changes.
Bazeley (2013) emphasizes the importance of keeping an audit trail that documents your thinking and feelings as you proceed through the data analysis. You can consider such questions as the following when constructing your audit trail:
Are findings grounded in the data? (How does the sampling affect interpretation? Is any piece of data given excessive weight compared to others?)
Are inferences logical? (Are analytic strategies applied correctly? Are alternative explanations accounted for?)
Is the coding structure appropriate?
What are the justifications of inquiry decisions and methodological shifts? How did hypotheses change over the course of the analysis?
What is the degree of researcher bias (premature closure, unexplored data in field notes, lack of search for negative cases, feelings of empathy)?
What strategies were used for increasing credibility (community involvement, member checks, feedback to informants, peer review, adequate time in the field)?
Cultural Bias
The comments included in the section on cultural bias for quantitative research are equally appropriate when analyzing and interpreting qualitative research. The opportunity to see things from your own cultural bias is recognized as a potential problem in qualitative research. Many of the safeguards discussed in Chapter 8 are useful for minimizing this source of bias or for recognizing the influence of the researcher’s own framework. You should begin by describing your own values and cultural framework for the reader. Then you should keep a journal or log of how your perspectives change through the study. Discussing your progress with a peer debriefer can enhance your ability to detect when your cultural lens is becoming problematic. Conducting member checks with participants who are members of the culture under study can help you see where divergence in viewpoints may be based on culturally different interpretations.
Generalization/Transferability
Differences of opinion exist in the qualitative research community with regard to claims that can be made about the generalizability of the findings. Recall that generalizability is a concept that is rooted in the postpositivist paradigm and technically refers to the ability to generalize results of research conducted with a sample to a population that the sample represents. In qualitative research, Guba and Lincoln (1989) proposed that the concept of transferability would be more appropriate. With this approach, the burden of proof for generalizability lies with the reader, and the researcher is responsible for providing the thick description that allows the reader to make a judgment about the applicability of the research to another setting. Stake (2006) offers the opinion that case studies can be conducted with no intention to generalize to other situations; these cases are studied because of intrinsic interest in that case. He also recognizes that case studies are sometimes undertaken to be able to describe a typical situation, so the case provides an opportunity to learn that may provide insight into other similar situations, or multiple case studies might be undertaken to demonstrate the range of commonality and diversity of a phenomenon. Aggregating across cases must be done cautiously and without loss of the uniqueness in the context of each case.
Member Checks
As mentioned in Chapter 8, member checks can be used during the process of data collection. They have a place at the stages of data analysis and writing as well. Birt, Scott, Cavers, Campbell, and Walter (2016) note that repetitive use of member checks at different phases of the study provides a method of increasing validity. They describe a study in which member checking was used in a study of patients who were diagnosed with melanoma. They used member checks throughout the study and include using this method to determine whether the results had resonance for the participants. They indicated that this enabled them to validate the results through the participants’ eyes. They do acknowledge that some researchers find the use of member checks at this stage to be problematic because the researcher has expertise in research and theory that the participants do not have.
Analytic and interpretive issues in mixed methods
Analytic and interpretive issues in mixed methods research are influenced by the researcher’s paradigm and the design of the study. If a sequential design is used, it is more likely that the data analysis of one type of data will precede that of the other type, with little consequence for integrating the two types of data. However, it is possible that researchers might want to use the data from both types of data collection to inform their conclusions, and this would be especially true in concurrent mixed methods designs. Hence, discussion of analytic and interpretive issues here focuses on the nexus at which the two types of data actually meet and are intended to have an interactive relationship with the intention of seeing how they might inform each other.3 This is the strategy used in the study of parents and their Deaf children (see Chapter 10, Meadow-Orlans, Mertens, & Sass-Lehrer, 2003). A national survey resulted in quantitative data that were disaggregated to determine characteristics of parents associated with different levels of satisfaction with early intervention services. These data were used to identify subgroups of parents from whom qualitative data would be collected by means of in-depth interviews or focus groups. In the final analysis and interpretation phase of the study, the quantitative data were used to provide a picture of each subgroup of parents, while the qualitative data provided a richer explanation of their experiences with their young Deaf or hard-of-hearing child.
Integration of the quantitative and qualitative elements of a study is the hallmark of mixed methods research. This integration can come at different phases of the study; however, in this section, I focus on the integration of quantitative and qualitative data at the analytic stage of the study. Bazeley (2018) provides general guidance for mixed methods analysis (as well as several analytic strategies for specific mixed methods designs). Many of the steps in this guidance hold true for all types of data:Analytic and Interpretive Issues in Mixed Methods
Preparing the data for analysis means entering the numerical data into appropriate software in a format that allows for quantitative analysis. Qualitative data should be transcribed and checked. Data from all sources should be organized and labelled.
Exploring the data: calculate descriptive statistics and consider graphic displays of quantitative data; read through and examine the qualitative data and proceed with coding; keep notes about potential spaces for integrating data from different sources.
Put the data sets into conversation with each other; identify common concepts or themes; compare and synthesize data based on the emerging areas of integration.
Compare the responses across different data sets. Connect quantitative responses (e.g., group membership) with qualitative data to ascertain a deeper meaning of the data.
Look for relationships between variables and codes to explore alternative explanations.
Develop reports that illustrate the data integration by data displays and source of evidence.
Remember Berliner, Barrat, Fong, and Shirk’s (2008) mixed methods study of policies and practices to improve high school graduation for dropouts from Chapter 1 (Sample Study 1.4)? They investigated the quantitative patterns in reenrollment and subsequent graduation from high school. In addition, their qualitative data analysis provided them with insights as to why students reenroll (e.g., it is hard for a high school dropout to get a job). They also identified factors that challenge school districts when trying to reenroll dropouts. Demand at alternative high schools exceeds capacity, yet traditional high schools do not offer the interventions needed to support reenrollees. State funds were tied to enrollment and attendance rates; dropouts even with reenrollment result in fewer state dollars. This is complicated by the requirements for specific tests and other curriculum requirements needed for graduation. The combination of the quantitative results that support the effects of reenrollment in the long term with the identification of challenges from the policies in place at the state and district levels provides a basis for recommendations for improving graduation rates for this population of students.
A Research Plan: The Management Plan and Budget
A research plan is needed to outline the steps that must be taken to complete the research study within a specific time frame and to identify the resources needed to accomplish this complex task. The research plan consists of two parts: the management plan and the budget. Together, these can be used to guide you as you conduct your research and to monitor your progress.
The Management Plan
To develop a management plan, you need to analyze the major steps in the research process—for example, the literature review, design of the study, implementation of the study, and analysis and report of the study. Then, for each major step, you should list the substeps that you need to accomplish to conduct your research. A sample management plan is presented in Table 13.3. You can use the management plan to monitor your progress and to make any midcourse corrections that might be necessary. In addition, it can serve as a trail that documents your actions, which would allow researchers to audit your work, including any divergences from the original plan and the reasons for those changes. Particularly in qualitative research, the divergences are important because they are expected to occur. Thus, you are able to establish a trail of evidence for decisions that influenced the direction of the project and to support any conclusions you reached based on the research outcomes.
If you are involved in a large-scale research project, it might be helpful to divide the work that will be done by people (defined in terms of positions—e.g., project director, research assistant) and identify the amount of time that each person will devote to the various activities. Such a person-loading chart can help with budgeting and justification for funding requests.
The Research Budget
Many dissertations and theses are written with very little or even no external funding. However, it is common for institutions to offer small grants to support such research or even for advanced students to seek outside funding. A sample budget sheet is presented in Table 13.4. General budget categories include personnel and nonpersonnel items. Under personnel, you should list all the people who will be involved in the research process, the amount of time each will be expected to work, and the rate at which each will be paid. Typical nonpersonnel items include travel, materials, supplies, equipment, communication expenses (telephone, postage), and copying. If you plan to pay people who participate in your research, that should be included in your budget as well. If you are seeking funds from an external source and you are located at a university, you will need to include indirect costs. This is typically a percentage of the overall amount requested that your institution will expect to receive to cover such costs as maintaining the building in which you have an office, the heat, the water, and other indirect costs of supporting you as you conduct your research.
Table 13.3 Sample Management Plan for Research Proposal
Research Function
Subtasks
Person Responsible
Start Date
End Date
1. Literature review
1.1 Identify key terms
1.2 Identify databases
1.3 Conduct electronic search
1.4 Conduct hand search of journals
1.5 Contact experts in the field, etc.
Research director
September 2018
November 2018
2. Instrument development
etc.a
Research director
November 2018
December 2018
3. Data collection
etc.
Research director
January 2019
March 2019
4. Data analysis
etc.
Research director
March 2019
April 2019
5. Reporting
etc.
Research director
April 2019
May 2019