Welcome to the mwetoolkit website

The Multiword Expressions toolkit is a tool that aids in the automatic identification of multiword units such as idiomatic expressions (kick the bucket) and phrasal verbs (take off, give up) in large text bases, independently of the language.


  • NEW! 11 Oct 2016: We have prepared a brand new CRF tagger for multiword expressions. Enjoy!
  • 28 Oct 2015: We have prepared some extra documentation that explains all Tools and their arguments
  • 08 Oct 2015: Release 1.1 is available for download on Gitlab. Take a look at the installation page. Enjoy!
  • 14 Sep 2015: We migrated the repository to Gitlab. Please, remove your local SVN copy and follow the updated installation instructions from GIT.
  • 17 Apr 2015: Release 1.0 is finally available for download on sourceforge! We also corrected many bugs and tests are now working again. Enjoy!
  • 10 Nov 2014: we updated the installation instructions and included a guide to use the toolkit on Windows via Cygwin.
  • 03 Nov 2014: the reference book of the mwetoolkit, including an overview of MWE processing, is now available to order as printed or e-Book on the SPRINGER website.
  • 26 Aug 2014: We have changed the look of the website. In the meanwhile, the mwetoolkit keeps getting better and better! Unfortunately, we cannot release new versions for the moment, but please dare using the SVN for the freshest updates. We're quite fast email answerers in case something goes wrong ;-)
  • 25 Mar 2013: The documentation of the patterns has been updated. We also updated the SVN repository after the sourceforge migration, please do a fresh checkout
  • 09 Mar 2012: Release 0.5 is now available on sourceforge! Lots of new features and up-to-date documentation are here! Enjoy :-)
  • 17 May 2011: The mwetoolkit release 0.4 is now available on sourceforge!
  • 17 May 2011: We created a Quick Start guide for you to understand how the toolkit works


The mwetoolkit is free. However, we kindly ask you to cite the following book and/or article in your publications:

  • Carlos Ramisch, Multiword Expressions Acquisition: A Generic and Open Framework", Theory and Applications of Natural Language Processing series XIV, Springer, ISBN 978-3-319-09206-5, 230 p., 2015.

  • Cover of 'Multiword Expressions Acquisition' book by Carlos Ramisch
  • Carlos Ramisch, "A Generic Framework for Multiword Expressions Treatment: from Acquisition to Applications", Proceedings of the ACL 2012 Student Research Workshop, Jeju, Republic of Korea, July, 2012.PDF


The mwetoolkit supports:

  • Efficient multilevel regexp-like searches in large corpora.
  • Efficient n-gram and word counting in large corpora.
  • Association measures, contrastive measures, variations, etc.
  • Quick evaluation tools.
  • Useful tools for formatting, preprocessing, combining information, etc.
  • Token-instance annotation through lexicon and patterns projection.
  • ...
  • Go to the Installation page to try it out!


      Powered by Phite