First Blush(2019)2 Available Subtitles [PORTABLE]
Because the methods were very similar for the classification of free associates and for the film and television subtitles, we report all of the methods for both tasks before proceeding to the results from both datasets. We begin by describing the methods for the free associate classification task; for the subtitles classification task, we only report the ways in which that task differed from the first task.
First Blush(2019)2 Available subtitles
The second norming task was analogous to the first, except that instead of classifying which homonym meaning was evoked by a particular free associate, raters classified which homonym meaning was evoked by a line of dialog extracted from a corpus of movie and television subtitles. Except as described below, all methods for the second task were identical to those used in the first task.
Recall that in the SUBTL data, we extracted for rating a sample of 100 lines of subtitles for almost all homonyms, or all available lines for a few homonyms that were part of fewer than 100 lines. As for the FAN data, to compute biggest for the subtitles data, we included all lines which showed agreement regardless of confidence level across the two raters. We grouped these rated lines for each homonym into subgroups according to the meaning classification offered by the raters. The estimate of biggest for each homonym was then simply the percentage of the lines in the meaning subgroup with the most lines.
Next, we explored the unique variance explained by measures of relative meaning frequency after controlling for the effects of other psycholinguistic variables. To do so, we first computed the residuals from multiple regressions that used the results from each mega-study as the dependent variable (ACC or RT) and included log10 word frequency (Brysbaert and New, 2009), orthographic Levenshtein distance (OLD; Yarkoni et al., 2008), number of phonemes, number of letters, number of syllables, number of senses, verb interpretations, noun interpretations, and letter bigram frequency as predictors. Except as cited above, all of these data were taken from the covariate data provided as part of the eDom norms (Armstrong et al., 2012). We then created simple regression models that used the different measures of relative meaning frequency to predict the residuals. In essence, this corresponds to a stepwise regression wherein the meaning frequency estimate is added last. The results are presented in Table 2 for all available data for each meaning frequency estimate (Table 3 in the Appendix includes the analyses of the intersection data). 041b061a72