Russian Newspaper Corpus

The present version of Russian Newspaper Corpus includes 29506 samples from 9 Russian newspapers, the total number of text wordforms is 7 452 100. The breakdown is as follows:

sampleswordforms
izIzvestiya 42401 144 100
lgLiteraturnaya Gazeta1035483 200
mkMoskovskiy Komsomolets54071 457 700
neNezavisimaya Gazeta36211 161 800
noNovaya gazeta659390 100
prPravda 51551370 300
rvRossiyskie Vesti1394352 700
sgSegodnya60961 116 600
spSanktpeterburgskiye Vedomosti5503975 600

Each sample is preceded by an index, where two first symbols refer to the newspaper, next symbol refers to the month ( a - January, b - February, etc.).

Distribution of text wordforms over months (in thousands):

January42August867
April43September1093
May25October1543
June345November1320
July1116December1058

The following two digits refer to the day of publication.

All this is followed by three symbol indication of topic:

accaccidentfemfeminismoccoccult knowledge
advadventurefinfinancepedpaedagogics
agragriculturehaphappeningpoepoetry
anianimalsheahealthpolpolitics
antanthropologyhishistoryprepress
arcarchitecturehumhumanitiesproprosa
armarmyjurjournalism (incl. polemics)psypsychology
artvisual artslablabourrelreligion
bibbibliographylawlawscascandal
cheChechnyaleileisuresciscience
cincinemaliflife storysemsemiotics
comcomputerslitliteraturesocsociety
conconsumerismlnglanguagespaspace
corcorruptionmasmass mediasposport
cricrimemaxmaximsspyspying
culculturemedmedicinestastatistics
curcuriositymemmemoirteltelevision
docdocumentmilmilitary complexthetheatre
ecneconomicsminminoritiestowtown
ecoecologymormoraltratradition
edueducationmusmusicturtourism
engengineeringnatnatureuniuniversal
essessaynecnecrologuewarwar
fasfashionnewnews

This eight-digit index may be followed by optional symbols, giving further specific information:


-aannouncement-iinterview-vhome/NIS
-bbook review-lletter-wNIS/foreign
-ddispute-mmemoir-xadvertisement
-fforeign-pperson-yhistory
-ghome/foreign-rregion
-hhumour-uNIS

Return to the previous page