Claws 7 tagging errors


double is a predeterminer.

New_NP1 York_NP1 City_NN1 's_GE unemployment_NN1 rate_NN1 was_VBDZ nearly_RR 
double_VV0 the_AT national_JJ average_NN1 in_II June_NPM1 ._.



,_, the_AT report_NN1 states_NN2 ._.


I_PPIS1 do_VD0 running_JJ on_II the_AT treadmill_NN1 ._.


(EVPB1) I_PPIS1 provided_CS21 that_CS22 yesterday_RT ._.


To_TO be_VBI kind_NN1 is_VBZ more_RGR important_JJ

looking at what is more common after vvg and nn1 shows that this tagging is logical, but not correct.

The_AT driving_JJ factor_NN1 for_IF the_AT global_JJ autonomous_JJ vehicles_NN2 market_VV0 is_VBZ due_II21 to_II22 its_APPGE ability_NN1 to_TO tackle_VVI many_DA2 issues_NN2 related_VVN to_II road_NN1 transportation_NN1 ._.

,_, thus_RR making_VVG the_AT dough_NN1 rise_NN1 ._.

go_VVI to_TO jail_VVI ._.


It seems NN1 NN2 together at the end of a sentence is more common that the singular verb. It’s a probabilty call…

The_AT dog_NN1 walks_NN2 ._.
The_AT man_NN1 walks_NN2 ._.
The_AT man_NN1 talks_NN2 ._.
A_AT1 man_NN1 walks_NN2 ._.
A_AT1 man_NN1 walks_VVZ to_II school_NN1 ._.
We_PPIS2 go_VV0 on_II dog_NN1 walks_NN2 ._.


This is something I have noticed before when I repeat something from concordance lines with similar grammar:

Asyoumayormaynotknow_VV0 ,_, big_JJ things_NN2 are_VBR happening_VVG here_RL 
in_II Williamsburg_NP1 ,_, Brooklyn_NP1 ._. 
Asyoumayhavenoticed_NP1 ,_, theChat_VV0 function_NN1 has_VHZ disappeared_VVN 
from_II the_AT top_NN1 of_IO the_AT Forum_NN1 ._.

This even happens when the sentences above are done one at a time!  I think I have found that formatting is the problem.  That means that formatted text causes problems!


Again punctuation when there’s a following article should make it clear:



This should never happen: weve_VV0 or wouldnt_VV0 or shouldnt_VV0 or theres_NN2


The tagger adds a space where it did not exist.

We_PPIS2 can_VM not_XX be_VBI responsible_JJ for_IF putting_VVG our_APPGE 
students_NN2 ,_, staff_NN and_CC families_NN2 at_II risk_NN1 ._.


The second error is a much bigger problem, however this can be coded out.

Console_VV0 players_NN2 neednt_VV0 worry_VV0 about_II PC_NN1 players_NN2 ._.


The following error is understandable since the tagger does not understand whether this could be a determiner or conjunction.

I_PPIS1 hope_VV0 that_DD1 overconfidence_NN1 will_VM lead_VVI them_PPHO2 to_TO
do_VDI this_DD1 sooner_RRR than_CSN they_PPHS2 ought_VMK to_TO ._.


More punctuation errors.  I guess that the tagger has a problem with the conjunction directly after was_v n’t_xx.

It_PPH1 wasnt_VV0 until_CS I_PPIS1 met_VVD my_APPGE husband_NN1 that_CST I_PPIS1 heard_VVD the_AT words_NN2 ,_, I_PPIS1 am_VBM sorry_JJ ,_, will_VM you_PPY please_RR forgive_VVI me_PPIO1 ?_?

This is an easy fix to do pattern replace: wasnt_VV0 ——-> was_VBDZ n’t_XX


claws7 has a problem with dare + Verb = ‘touch’ should be a verb.

._. The_AT main_JJ premise_NN1 of_IO the_AT show_NN1 is_VBZ puberty_NN1 and_CC all_DB the_AT awkward_JJ ugliness_NN1 that_CST other_JJ shows_NN2 do_VD0 n’t_XX dare_VVI touch_NN1 ._.

this can be solved with a pattern replace.



I usually discover tagging errors when I can’t understand why the grammar has been tagged incorrectly.  Here is another where the noun has been made a verb incorrectly.  There’s not much to do about this one.

The_AT band_NN1 were_VBDR due_JJ to_TO play_VVI four_MC arena_NN1 shows_VVZ across_II the_AT UK_NP1 this_DD1 September_NPM1 ._.



I am not sure how to deal with the errors that I am finding in the tagging system.  They are a problem for the coding and grammar tagging of the ‘complexity checker’ too.  The main problem below is all the apostrophes are disappearing.  A patch solution is to keep adding more pattern – replace codes at the start of the programme as I discover them.

Kid Cudi has announced he’s dropping a new track with Eminem tomorrow, and it’s the collaboration we didn’t know we needed.

Kid_VV0 Cudi_NP1 has_VHZ announced_VVN he_PPHS1 s_VBZ dropping_VVG a_AT1 
new_JJ track_NN1 with_IW Eminem_NP1 tomorrow_RT ,_, and_CC its_APPGE the_AT 
collaboration_NN1 we_PPIS2 did_VDD nt_XX know_VVI we_PPIS2 needed_VVD ._.



Today I notice that CLAWS7 is also dropping quotation marks which I cannot write code for since there is no logical place to guess where they should go.  The ones in red below disappear in the automatic tagging.

In introducing a voice feature, Twitter said its motivations are to create a more human experience and remove the ambiguity of using only text.