complexity checker

The Complexity Checker

Being able to analyze the complexity in a corpus is not an original thing.   Gupta, T., Srivastava, B., & Bauer, A. A Corpus Complexity Analyzer for NLP Applications Show that there are clearly many uses that are different.  Some need to simplify their writing to make it more easily understood for business reasons.

Yet for EnglishGrammar.Pro it is about L2 writing which has also been covered extensively in the literature.  For example, p.35 of

Kim, J. Y. (2014). Predicting L2 Writing Proficiency Using Linguistic Complexity Measures: A Corpus-Based Study. English Teaching69(4).

contains a long list of methods that their study uses.

Linguistic Complexity Features Measured for the Current Study
Type Measure (Code) Formula
Text length Number of words (NW) # of words
Number of sentences (NS) # of sentences
Type 1: Lexical density
Lexical density (LD) Nlex/ N
Type 2: Lexical sophistication
Lexical sophistication 1 (LS1) Nslex / Nlex
Lexical sophistication 2 (LS2) Ts/T
Verb sophistication 1 (VS1) Tsverb / Nverb
Verb sophistication 2 (VS2) Tsverb /
Corrected VS1 (CVS1) T2
sverb / Nverb
Type 3: Lexical variation
Number of different words (NDW) T
Type/Token ratio (TTR) T/N
Corrected TTR (CTTR) T/
Root TTR (RTTR) T/
Lexical word variation (LV) Tlex / Nlex
Squared VV1 (SVV1) T2
verb / Nverb
Corrected VV1 (CVV1) Tverb /
Verb variation 1 (VV1) Tverb / Nverb
Verb variation 2 (VV2) Tverb / Nlex
Noun variation (NV) Tnoun / Nlex
Adjective variation (AdjV) Tadj / Nlex
Adverb variation (AdvV) Tadv / Nlex
Modifier variation (ModV) (Tadj + Tadv)/ Nlex
Type 1. Length of production
Mean length of sentences (MLS) # of words/# of sentences
Mean length of T-unit (MLT) # of words/# of T-units
Mean length of clause (MLC) # of words/# of clauses
Type 2: Sentence complexity
Clause per sentence (C/S) # of clauses# of sentences
Type 3: Subordination
Clause per T-unit (C/T) # of clauses/# of T-units
Complex T-unit ratio (CT/T) # of complex T-unit/# of T-units
Dependent clause per clause (DC/C) # of dependent clauses/# of clauses
Dependent clause per T-unit (DC/T) # of dependent clauses/# of T-units
Type 4: Coordination
T-unit per sentence (T/S) # of T-units/# of sentences
Coordinate phrase per clause (CP/C) # of coordinate phrases/# of clauses
Coordinate phrase per T-unit (CP/T) # of coordinate phrases/# of T-units
Type 5: Particular structures
Complex nominal per T-unit (CN/T) # of complex nominals/# of T-units
Complex nominal per clause (CN/C) # of complex nominals/# of clauses
Verb phrase per T-unit (VP/T) # of verb phrases/#of T-units
Notes. N = the number of words; Nlex = the number of lexical words; Nslex = the number of
sophisticated lexical words; Nverb = the number of verbs; T = the number of word types; Tlex = the
number of lexical word types; Ts = the number of sophisticated word types; Tsverb = the number of
sophisticated verb types; # = number; / = divided by; T-unit: one main clause + any subordinate

Yet the main difference between our EnglishGrammar.Pro complexity checker and the above is that they rely on word, clause etc. length.