语料库翻译研究+认知空间分享 http://blog.sciencenet.cn/u/carldy 探索翻译研究新途径,反思语言认知研究


A multi-feature/multi-dimensional analysis

已有 3905 次阅读 2012-3-26 17:07 |个人分类:语言学探讨 Linguistics|系统分类:科研笔记| Analysis, corpus, Linguistics, multi-dimensional

Douglas Biber (1988,1995)discuss a multidimensional approach to register variation.

The question is:

How to search those linguistic features using different software?

Here enclosed the search algorithms designed by the authors of "Corpus-based language studies: an advanced resource book".


Search Algorithms

These search algorithms are designed to extract the 58 linguistic features from CLAWS tagged corpora (C7) for use in a multi-feature/multi-dimensional analysis. A detailed discussion of the functions of these linguistic features can be found in Biber (1988: 211-245). File-based search patterns can be downloaded below. After downloading, extract these compressed text files into c:wsmith. These algorithms are designed for use with WordSmith Tools version 3.

After starting WordSmith, go to ‘Settings – Tags’ and activate ‘Tags to ignore’ (<*>). This allows the program to ignore all elements included in the angular brackets (metadata, comments, etc) in the corpus files. Copy and paste these search patterns into the text box ‘Search word or phrase’. Adjust ‘Context words & Context search horizons’ (left and right) where appropriate as specified for individual algorithms.

Factor 1 (28 linguistic features):

(1) private verbs: c:wsmithprivatev.txt

(2) THAT deletion: c:wsmiththatdel1.txt – c:wsmiththatdel8.txt

(3) contraction: *'*

Context 1L 2R =~*_GE/~"_"/~*_NP*/~*_NN*/~*_MC*/~*_RA/~*_UH*/~*_FO/~'_"

(4) present tense verbs: c:wsmithpresent.txt

(5) 2nd person pronouns: *_PPY/your_APPGE/yourself_PPX1/yourselves_PPX2/ yours_PPGE

(6) DO as pro-verb: *_VD*

Context 0L 4R =~*_XX/~*_PPY/~*_PP?S*/~*_V?I

(7) analytic negation: *_XX

(8) demonstrative pronouns: this_DD1/that_DD1/these_DD2/those_DD2

Context 0L 3R=~*_NN*/~*_NP*/~*_PN1

(9) general emphatics: c:wsmithemphatic.txt

(10) 1st person pronouns: *_PPI*/my_APPGE/our_APPGE/myself_PPX1/ourselves _PPX2/mine_PPGE/ours_PPGE

(11) pronoun IT: it_PPH1

(12) BE as main verb: *_VB*

Context 0L 3R =*_D*/*_A*/*_NNB/*_I*/*_J*/~*_V?G/~*_V?N

(13) causative subordination: because_CS

(14) discourse markers:

a) well_* context 1L 0R = ~AS_*/~FEEL*_V*/~FELT_V*;

b) now_*/anyway*_*/anyhow_*

Context 2L 0R =?_?/AND_*/BUT_*/*_UH/~*_V*/~RIGHT_*

(15) indefinite pronouns: none_PN/*_PN1

(16) general hedges: c:wsmithhedge.txt

(17) amplifiers: c:wsmithamplify.txt

(18) sentence relatives: ,_, which_DDQ

(19) WH questions: ?_? WHAT_DDQ/?_? *_RRQ

Context 0L R3 =*_VD*/*_VB*/*_VH*/*_VM*

(20) possibility modals: can_VM/ca_VM/could_VM/may*_VM/might_VM

(21) non-phrasal coordination:

a) ,_, AND_CC IT_P*/,_, AND_CC SO_*/,_, AND_CC THEN_*/,_, AND_CC YOU_PPY*


c) ,_, AND_CC TH*_DD1/,_, AND_CC TH*_DD2/,_, AND_CC *_PP?S*

(22) WH clauses: c:wsmithpps.txt context 0L 3R= *_DDQ/~?_?/~*_I*

(23) final prepositions: *_I* context 0L 2R=?_?/~(_(

(24) other nouns: *_NN*/*_NP*/*_ND1

Context 0L 0R = ~*TION*_N*/~*MENT*_N*/~*NESS*_N*/~*ITY_N*/~*ITIES _N*

(25) word length: (WordSmith wordlist function: average word length)

(26) prepositions: *_I*

(27) type/token ratio: (WordSmith wordlist function: standardized type/token ratio)

(28) attributive adjectives: *_JJ *_NN*/*_JJ *_JJ

Factor 2 (6 linguistic features):

(29) past tense verbs: *_V?D*

(30) 3rd person pronouns: c:wsmith3persprn.txt

(31) perfect aspect verbs: c:wsmithperf_asp.txt

(32) public verbs: c:wsmithpublicv.txt

(33) synthetic negation: no_AT/neither_*/nor_*

(34) present participial clauses: ,_, *_V?G *_I*/,_, *_V?G *_D*/,_, *_V?G *_P*/,_, *_V?G *_R*

Context L3 0R= ~*_VB*

Factor 3 (7 linguistic features):

(35) WH relative clauses: *_NN* *_PNQ*/WHICH*_DDQ*/WHOSE_DDQGE

Context 1L 0R= ~ASK*_V*/~TELL*_V*/~TOLD_V*/~*_I*/~?_?

(36) pied piping constructions: *_NN* *_PNQ*/WHICH*_DDQ*/WHOSE_DDQGE

Context 1L 0R =*_I*

(37) phrasal coordination: *_R* and _CC *_R*/*_J* and_CC *_J*/*_V* and_CC *_V*/*_N* and_CC *_N*

(38) nominalizations: *tion_N*/*_tions_N*/*ment_N*/*ments_N*/*ness_N*/ *nesses_N*/*ity_N*/*ities_N*

(39) time adverbials: *_RT*

(40) place adverbials: *_RL*

(41) other adverbs: *_R* minus all totals of hedges, amplifiers, downtoners, place adverbials and time adverbials

Factor 4 (6 linguistic features):

(42) infinitives: to_TO *_V?I/to_TO *_R* *_V?I/to_TO *_R* R_* *_V?I

(43) prediction modals: will_VM/wo_VM/shall_VM/sha_VM/'ll_VM/would_VM/ 'd_VM

(44) suasive verbs: c:wsmithsuasivev.txt

(45) conditional subordination: if_CS/unless_CS

(46) necessity modals: ought_VM*/should_VM/must_VM

(47) split auxiliaries: c:wsmithsplitaux.txt

Factor 5 (6 linguistic features):

(48) conjuncts: c:wsmithconjunct.txt

(49) agentless passives: c:wsmithagtlspsv.txt

Context 0L 6R=~by_II

(50) past participial clauses: ?_? *_V?N *_I*/?_? *_V?N *_R*

(51) BY-passives: c:wsmithby_psv.txt

Context 0L 6R=by_II

(52) past participial WHIZ deletions: c:wsmithwhizdel.txt

Context 2L 0R= ~GET*_V*/~GOT_V*/~*_VH*

(53) other adverbial subordinators: c:wsmithotheradv.txt

Factor 6 (4 linguistic features):

(54) THAT clauses as verb complements: *_V* that_CST

(55) demonstratives: THESE_DD2/THOSE_DD2/THIS_DD1/THAT_DD1

Context 0L 3R= *_NN*/*_NP*/*_PN1

(56) THAT relative clauses: *_NN* THAT_CST

Context 0L 4R= *_AT*/*_D*/*_NP*/*_PP*/*_N*2*

(57) THAT clauses as adjective complements: *_JJ that_CST

Context 1L 0R= ~so_*


Factor 7 (1 linguistic feature):

(58) SEEM/APPEAR: seem*_V*/appear*_V


上一篇:Emic/Etic Distinctions
下一篇:Common European Framework of Reference for Languages
收藏 IP: 161.64.43.*| 热度|


该博文允许注册用户评论 请点击登录 评论 (0 个评论)


Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-9-20 01:06

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社
