Calculator
August 15, 2012 — 10:54

In phraseological studies it is often important to know whether the collocations of a keyword are statistically significant – i.e.whether the apparent “attraction” between a keyword and another word/a collocate is a matter of chance – or whether it is a “rule”. Few pieces of language analysis software can to this, but this calculator can.

Use it this way:

The upper fields captioned “Keyword”, “Collocate” and “Corpus” are only for practical information.
  1. Enter the number of tokens (words) in the corpus, and the total number of occurrences of the presumed collocation (here 60152 and 23 respectively)
  2. Enter the “span” (i.e. the number of words on each side of the keyword – here 8, namely 4 words to the left of the keyword and 4 words to the right of the keyword). Enter the number of concordance lines (here) 99. Enter the number of occurrences of the presumed collacation in the concordance lines (here 5).
  3. Click the “Calc” button, and the results will appear in the Results’ fields.

  • Once the calculation has been done, the results can be copied to the clipboard, and pasted into any application that accepts text. (See an example below).

Ain’t got a clue as to the use of this – and still interested? Well, have a look at Jun Da’s article which explains it very well indeed.

A different Mutual Information:

  • Hanks et al.’s article does not include information about concordance lines, and concequently it uses a slightly different formula. This calculator can easily be used to calculate Mutual Information this way too if you set span to 1, and Concordance lines to the number of occurrences of the keyword in the corpus. Using Hanks’ first example (strong + northerly) the screen should look thus:

And the textual output from this calculation will read:

Yes, I want to download T/Z-score and Mutual Information.

  • Once downloaded unzip it, read the ReadMe-file and doubleclick the .exe-file.
  • If you find the program useful or if you have comments on it, a couple of words would be very much appreciated indeed, and you might even receive an update if it ever surfaces.
  • If you state a good reason, the source code is also available.