Wmatrix is a software tool for corpus analysis and comparison. It provides a web interface to theUSAS and CLAWScorpus annotation tools, and standard corpus linguistic methodologies such as frequency lists and concordances. It also extends the keywords method to key grammatical categories and key semantic domains.
Wmatrix allows the user to run these tools via a web browser such as Opera, Firefox or Internet Explorer, and so will run on any computer (Mac, Windows, Linux, Unix) with a web browser and a network connection. Wmatrix was initially developed by Paul Raysonin the REVERE project, extended and applied to corpus linguistics during PhD work and is still being updated regularly. Earlier versions were available for Unix via terminal-based command line access (tmatrix) and Unix via Xwindows (Xmatrix), but these only offer retrieval of text pre-annotated with USAS and CLAWS.
Publications and applications:
Systems engineering: see the publications listed under the REVERE project. For example: Sawyer, P., Rayson, P. and Cosh, K. (2005) Shallow Knowledge as an Aid to Deep Understanding in Early Phase Requirements Engineering. IEEE Transactions on Software Engineering. Volume 31, number 11, November, 2005, pp. 969 - 981. ISSN 0098-5589. doi: http://doi.ieeecomputersociety.org/10.1109/TSE.2005.129
Aspect oriented requirements engineering: identification of early aspects. See, for example: Chitchyan, R., Sampaio, A., Rashid, A. and Rayson, P. (2006). Evaluating EA-Miner: Are Early Aspect Mining Techniques Effective? In proceedings of Towards Evaluation of Aspect Mining (TEAM 2006). Workshop Co-located with ECOOP 2006, European Conference on Object-Oriented Programming, 20th edition, July 3-7, Nantes, France, pp. 5-8.
Corpus-based impact analysis of academic research: Francois Taiani, Paul Grace, Geoff Coulson and Gordon Blair (2008) Past and future of reflective middleware: Towards a corpus-based impact analysis. The 7th Workshop On Adaptive And Reflective Middleware (ARM'08) December 1st 2008, Leuven, Belgium, collocated with Middleware 2008.
Ontology learning: Gacitua, R., Sawyer, P., Rayson, P. (2008). A flexible framework to experiment with ontology learning techniques. In Knowledge-Based Systems, 21, 3, April 2008, pp. 192-199. DOI: 10.1016/j.knosys.2007.11.009
Frequency profile comparison of written and spoken English: See Leech, G., Rayson, P., and Wilson, A. (2001). Word Frequencies in Written and Spoken English: based on the British National Corpus. Longman, London. (see the companion website for more details)
Political science research: Beigman Klebanov, B., Diermeier, D., and Beigman, E. 2008. Automatic annotation of semantic fields for political science research. Journal of Language Technology and Politics 5(1):95-120. http://www.cs.huji.ac.il/~beata/publications.html
Corpus stylistics (2): A number of papers were presented at the PALA 2007 conference (29-30 July 2007, Kansai Gaidai University, Osaka, Japan) including those by Geoffrey Leech, Yu-fang Ho, Dan McIntyre, Haruko Sera, Brian Walker. Mick Short and Brian Walker also ran a Workshop: Using Wmatrix to compare scenes from Harold Pinter's Betrayal. See the book of abstracts on the conference website for more details.
Training chatbots: comparison of human-human and human-machine dialogues. See Abu Shawar, Bayan; Atwell, Eric. Using dialogue corpora to train a chatbot. In Archer, D, Rayson, P, Wilson, A & McEnery, T (editors) Proceedings of CL2003: International Conference on Corpus Linguistics, pp. 681-690 Lancaster University. 2003.
Computer content analysis: analysis of interview transcripts.
Computer content analysis of political discourse. See Xin Huang (2003) A Computer-aided Diachronic Content Analysis of Twentieth Century Political Discourse in China. MA dissertation in Language Studies, Lancaster University.
Key word analysis (1): See Marilyn Deegan, Harold Short, Dawn Archer, Paul Baker, Tony McEnery, Paul Rayson (2004) Computational Linguistics Meets Metadata, or the Automatic Extraction of Key Words from Full Text Content. RLG Diginews, Vol. 8, No. 2. ISSN 1093-5371.
Key word analysis (2): Walkerdine, J. and Rayson, P. (2004) P2P-4-DL: Digital Library over Peer-to-Peer. In Caronni G., Weiler N., Shahmehri N. (eds.) Proceedings of Fourth IEEE International Conference on Peer-to-Peer Computing (PSP2004) 25-27 August 2004, Zurich, Switzerland. IEEE Computer Society Press, pp. 264-265. ISBN 0-7695-2156-8.
Key word-class analysis for EAP: See Jones, M., Rayson, P. and Leech, G. (2004) Key category analysis of a spoken corpus for EAP. Presented at The 2nd Inter-Varietal Applied Corpus Studies (IVACS) International Conference on "Analyzing Discourse in Context" The Graduate School of Education, Queen’s University, Belfast, Northern Ireland, 25 - 26 June, 2004.
Phraseology: Magali Paquot, Sylviane Granger, Paul Rayson and Cédrick Fairon (2004) Extraction of multi-word units from EFL and native English corpora: The phraseology of the verb 'make'. Presented at Europhras, European Society of Phraseology, 26-29 August 2004, Basel, Switzerland.
Comparison of political party manifestos: (Labour versus LibDem UK 2001 General Election) Paul Rayson (2004). Keywords are not enough. Invited talk for JAECS (Japan Association for English Corpus Studies) at Chuo University, Tokyo, Japan, 27th November 2004. (slides)
Metaphors in political discourse: Emilie L'Hote and Maarten Lemmens (2009) Reframing treason: metaphors of change and progress in new Labour discourse. CogniTextes, Volume 3, http://cognitextes.revues.org/index248.html
Key domain analysis (2): Archer, D., Culpeper, J. and Rayson, P. (2005) Love - a familiar or a devil? An exploration of key domains in Shakespeare’s Comedies and Tragedies. Presented at the AHRC ICT Methods Network Expert Seminar on Linguistics. Lancaster University, 8 September 2005.
Key domain analysis (3): Yufang Ho. (2007) Investigating the key concept differences between the two editions of John Fowles's The Magus - a corpus semantic approach.? The 27th International Conference of the Poetics and Linguistics Association (PALA), Kansai Gaidai University, Hirakata, Osaka, Japan, 31 July - 4 August 2007.
Key domain analysis (4): Afida Mohamad Ali (2007). Semantic fields of problem in business English: Malaysian and British journalistic business texts. Corpora, 2, 2, pp. 211-239.
Analysis of online language (1): Vincent B.Y. Ooi, Peter K.W. Tan & Andy K.L. Chiang (2007) Analyzing personal weblogs in Singapore English: the Wmatrix approach. Studies in Variation, Contacts and Change in English. Volume 2. Research Unit for Variation, Contacts and Change in English (VARIENG), University of Helsinki. http://www.helsinki.fi/varieng/journal/volumes/02/ooi_et_al/
Analysis of online language (2): Vincent B.Y. Ooi (2008) lexis of electronic gaming on the Web: a Sinclairian approach, International Journal of Lexicography, 21 (3), 311-323. doi: 10.1093/ijl/ecn021
e-learning materials development: Nakano, T. and Koyama, Y. (2005). e-Learning Materials Development Based on Abstract Analysis Using Web Tools. Knowledge-Based Intelligent Information and Engineering Systems. 9th International Conference, KES 2005, Melbourne, Australia, September 14-16, 2005, Proceedings, Part I, LNCS 3681, Springer, pp. 794-800. DOI 10.1007/11552413_113
Linguistic modality study: Gabrielatos, C. and McEnery, T. (2005). Epistemic modality in MA dissertations. In. Fuertes Olivera, P.A. (ed.) Lengua y Sociedad: Investigaciones recientes en lingüística aplicada. Lingüística y Filología no. 61. Valladolid: Universidad de Valladolid, pp. 311-331.
Entrepreneurship studies: Doherty, N., Lockett, N., Rayson, P. and Riley, S. (2006). Electronic-CRM: a simple sales tool or facilitator of relationship marketing? 29th Institute for Small Business & Entrepreneurship Conference. International Entrepreneurship - from local to global enterprise creation and development. 31 October - 2 November 2006, Cardiff-Caerdydd, UK.