Elsnet
 
   


ELSNET-list archive

Category:   E-Material
Subject:   the IPI PAN Corpus of Polish
From:   Adam Przepiorkowski
Email:   adamp_(on)_ipipan.waw.pl
Date received:   22 Mar 2006

The 2nd edition of the IPI PAN Corpus of Polish, developed at the Institute of Computer Science of the Polish Academy of Sciences (PAS), is available at the web pages of: - the Institute of Computer Science PAS: http://korpus.pl/en/ - the Institute of Polish Language PAS: http://corpus.ijp-pan.krakow.pl/en/ To the best of our knowledge, this is currently the largest searchable morphosyntactically annotated corpus of Polish available to the public. The whole corpus consists of over 250 million segments (about 200 million orthographic words) and it is not balanced, but a balanced sample of over 30 million segments is also available. These corpora can be directly searched at the above addresses (do read the query syntax cheatsheet at http://korpus.pl/en/cheatsheet/index.html) or downloaded in a binary form to be used with a standalone version of the corpus search engine Poliqarp (announced separately on the 'corpora' list). Note that the standalone Poliqarp offers much greater functionality than the web interface (e.g., it shows metadata, presents more results, etc.). Best regards, Adam P. -- Adam Przepiorkowski http://nlp.ipipan.waw.pl/ ----- Linguistic Engineering Group http://korpus.pl/ ------------- the IPI PAN Corpus of Polish
 

[print/pda] [no frame] [navigation table] [navigation frame]     Page generated 14-02-2008 by Steven Krauwer Disclaimer / Contact ELSNET