| ECI Listing
 
List of contents of the ECI CDROM
Id        Type                Language       Size (K words)
 alb01   Word list and Texts    Albanian        205
         (a) Albanian word list 32K words with syntactic classes
             The Albanian dictionary of the 1984 published in
             Tirana by the Academy of Sciences.
         (b) The novel "Koncert në fund të dimrit" by Ismail
             Kadare published in Tirana.
 bul01   Technical              Bulgarian          5
         A number of scientific papers from "Science" journal.
 chi01   Newspaper        Chinese       2895
         The PH text corpus described here contains 3.75 million
         Chinese characters. It is a collection of news from the
         China's official Xinhua (New China) news agency
         (hereafter XinHua) during a period from January 1990 to
         March 1991. It is GB coded with word and phrase
         boundaries marked.
 cze01   newspaper         czech         726
         Newspaper Texts (Lidove noviny, Literarni noviny)
 cze02   newspaper         czech        4000
         Newspaper Texts (Lidove noviny, Literarni noviny)
 dut01   newspaper         dutch         600
         Articles from the student newspaper Universiteitskrant
         of the University of Groningen from the academic years
         1990/1991 and 1991/1992.
 dut02   mixed             dutch        5203
         A large Dutch corpus from INL including transcripts of radio
         programs, newspaper and magazine issues and some technical texts.
 dut03   mixed             dutch        128
         A continuation of dut02.
 eng01   novels             english      241
         Three English novels from the OTA collection:
         Thomas Hardy           'Far from the Madding Crowd'
         George Eliot           'Silas Marner'
         Charles Dickens        'A Christmas Carol'
 eng02   novels            english       900
         The Complete Sherlock Holmes, Sir Arthur Conan-Doyle.
 est01   mixed             estonian      100
         Extracts from general fiction and prose.
 fre01   newspaper          french      4121
         Text from Le Monde newspaper, consisting of articles
         from September and October 1989, and January 1990.
 gae01   dictionary           gaelic     141
         MacBain, Alexander, "An etymological dictionary of the
         Gaelic language", Gairm Publications, 1982
         1st edition - 1896 revised 1911
 ger01   sentenceList        german       20
         Lists of german sentences - tagged with some syntactic info.
         The sentence test suite of DiTo, a linguistic database
         for diagnostics in the syntax components of NLP systems.
 ger02   newspaper          german       191
         German Newspaper articles from VDI-Nachricten 1990-1991
 ger03   newspaper      german  34291
         Frankfurter Rundschau Newspaper text
 ger04   newspaper      german  7376
         Donau Courier newspaper texts
 gre01   mixed             greek        2515
         Newspapers, periodicals, popular fiction 1976-1990;
 ita01   novels             italian       13
         6 short stories by G.Verga
 ita03   newspaper              italian  303
         Corpus of Italian newspapers (La Republica, La Stampa,
         Il Mattino, Il Corriere)
 jap01   dictionary        japanese      203
         EDICT Japanese/English dictionary.
 jap02   Technical              Japanese          148(?)
         Japanese version of the ITU CCITT data.
 lat01   poetry            Latin        75
         Vergil, Aeneid, book I - XII
         Vergil, Georgicon, book I - III
 lit01   Fiction                Lithuanian        20
         "KOLEKCIONIERIUS" Story
 mal01   Technical/Novels       Malay   563
         A collection of original Malay texts and translations
         from English, mainly technical books with some novels.
         From University Sains Malaysia and Dewan Bahasa &
         Pustaka (publishers)
 mul01   Financial               En/Fr/Ge        566
         Financial reports from Union Bank Switz. (most french-german)
 mul02   technical              Fr/Ge/It         177
         Avalanche bulletins 1986-1991 (ca. 40 per year/250 words)
         Swiss Federal Institute for Snow and Avalanche Bulletins.
         (Very little Italian)
 mul03   legal        Fr/Ge/It   227
         Text of Swiss Civil Code
 mul04   technical               En/Fr/Sp        13497
         International Telecommunications Union CCITT handbook
 mul05   legal             En/Fr/Spa     5000 K words
         International Labour Organisation "Official Bulletin, B Series":
         "Reports of the Committee on Freedom of Association of the Governing
         Body of the ILO and related material 1984-1989".
 mul06   technical          9 EC langs   219
         The announcement text of the EC Esprit program.
 mul07   sentencelist             En/Fr   12
         BABEL project data - French business sentences and English
         translations.
 mul08   novel              En/Serb      386
         George Orwell's "1984" in English, Serbian, Croatian and
         Slovenian versions.
 mul09   technical            5 EC langs         248
         ScanWorX User's Guide (Optical Character Reader)
 mul10   Mixed          English/French          19
         HCRC MT Evaluation Corpus: French/English parallel texts
 mul11   Financial      German/French   615
         Financial Reports from CREDIT SUISSE
 mul12   Legal          Danish/Spanish/English  1199
         The machine-readable 'Civil Law Corpus' from the
         Copenhagen Buisness School
 mul13   novel             Uzbek/English        72
         Usbek Novel 'Ärk Freedom' with English interlineal translation
 nor01   novels           norwegian     2226
         Collection of texts Bokmaal & Nynorsk, some novels and some
         Ibsen plays.
 por01   mixed            portuguese     675
         An extract from the Borba/Ramsey corpus of Brazilian Portuguese.
 rus01   technical         Russian      364
         Technical reports (computer related) by Andrei Mikheev
 ser01   stories           serbian       700
         Short stories and novel extracts
 spa01   speech            spanish      1041
         Transcribed Spanish speech from
         CORPUS ORAL DE REFERENCIA DEL ESPANOL CONTEMPORANEO 1991-1992
 spa02   newspaper              spanish  447
         1 week of local Spanish newspaper "Sur" from April and Sept 1991.
 spa03   newspaper          spanish      830
         "El Diario Vasco" newspaper articles 1991
 swe01   mixed               swedish    1718
         A Fragment of SUC: the Stockholm-Umea Corpus of modern
         written Swedish. Text extracts (~2000 words each) from
         books and newspapers published after 1990.
 tur01   dictionary        turkish       173
         pc-kimmo rule specification and word lists for turkish morphology
 tur02   newspaper         turkish       110
         This is news text excerpted from the Anatolia New Agency feed
         covering roughly Sept/Oct 1992. Aproximately 10% of the total.
 Total                                   98,792 K words
 
 |