The study examines the territorial organization of the Ural peoples and languages, reveals the features of the regional interaction and population dynamics, addresses the problems of the ancestral homeland and ethno genesis, the role of Finno-Ugrians in the history of Russia, highlights the acute issues of preserving languages and culture. The research focuses on some features in the development of the Uralic ethnic groups within the framework of the Russian civilization. It is alleged that the ethnic and political consolidation of most Finno-Ugric tribes was born in the bowels of the ancient Russian state and the first sprouts of the future unity of peoples were born, and their role in the powerful national-state construction that unfolded on the huge Eurasian territory of Russia in the period between 16th and 20th century is analyzed. The outcomes confirm that the Finno-Ugric peoples have always been an organic part of the Russian ethno-cultural mosaic. They actively participated in the strengthening of the state, mastered its vast natural wealth, and created the economic power of the country. Due to the fact that in Finno-Ugric, significant attention is paid to the problems of preserving languages and culture among Samoyed peoples, which cannot be called Finno-Ugric, we propose to use the term "Uralistics" more widely. This is a more accurate concept and can be used in the study of cultural processes among the peoples united in the Uralic language family.
International audience ; Russia's Uralic peoples' written culture begun at different moments and in different ways. This article will present the main features of this evolution and will concentrate on one short but significant episode : the period, during the thirties, when the Uralic languages – as well as most of the languages of Soviet nationalities – adopted the Latin alphabet. We will analyse the origin and the reasons of such a development, its influence and consequences for the different Finno-Ugric peoples and as a conclusion, we will examine the general results and the political changes at the end of the period, which leadeverywhere to the change of Latin alphabet to Russian one. ; Kirja teke Venemaa soome-ugri rahvaste juures toimus erinevatel viisidel ja erinevatel aegadel. Sissejuhatuseks esitab käesolev artikkel selle protsessi üldjooni , et pikemalt paeatuda selle ajaloo uhe lühikese kuid tähendusrikka momendi analüüsil: kuidas kolmekümnendatel aastatel paljudes soome-ugri keeltes hakati kasutama ladina tähestikku nii nagu ka Nõukogude Liidu enamus rahvuste keeltes. Vaadeldakse esiteks selle protsessi algupära ning põhjusi siis selle mõjud erinevates soome-ugri keeltes - volga ja permi, läänemere-soome keeltes ja põhjarahvaste keeltes, võetakse kokku nende muutuste tulemusi ja lõpuks vaadeldakse, kuidas ja miks kolmekümnendate aastate lõpus minti üle kõikjal vene tähestikule. ; Le passage à l'écrit des langues finno-ougriennes de Russie s'est accompli en plusieurs étapes, à des moments différents suivant les peuples. Après avoir résumé le déroulement de ces processus pour l'ensemble de ces peuples, le présent article s'arrête sur un épisode, limité mais significatif, de cette histoire : l'utilisation, pendant une partie des années 1930, pour la plupart de ces langues, de l'alphabet latin, dans le cadre d'un mouvement plus large concernant l'ensemble des nationalité de l'URSS. Après avoir étudié les origines de ce mouvement et sa raison d'être, nous analyserons ses répercussions sur l'ensemble des ...
International audience ; Russia's Uralic peoples' written culture begun at different moments and in different ways. This article will present the main features of this evolution and will concentrate on one short but significant episode : the period, during the thirties, when the Uralic languages – as well as most of the languages of Soviet nationalities – adopted the Latin alphabet. We will analyse the origin and the reasons of such a development, its influence and consequences for the different Finno-Ugric peoples and as a conclusion, we will examine the general results and the political changes at the end of the period, which leadeverywhere to the change of Latin alphabet to Russian one. ; Kirja teke Venemaa soome-ugri rahvaste juures toimus erinevatel viisidel ja erinevatel aegadel. Sissejuhatuseks esitab käesolev artikkel selle protsessi üldjooni , et pikemalt paeatuda selle ajaloo uhe lühikese kuid tähendusrikka momendi analüüsil: kuidas kolmekümnendatel aastatel paljudes soome-ugri keeltes hakati kasutama ladina tähestikku nii nagu ka Nõukogude Liidu enamus rahvuste keeltes. Vaadeldakse esiteks selle protsessi algupära ning põhjusi siis selle mõjud erinevates soome-ugri keeltes - volga ja permi, läänemere-soome keeltes ja põhjarahvaste keeltes, võetakse kokku nende muutuste tulemusi ja lõpuks vaadeldakse, kuidas ja miks kolmekümnendate aastate lõpus minti üle kõikjal vene tähestikule. ; Le passage à l'écrit des langues finno-ougriennes de Russie s'est accompli en plusieurs étapes, à des moments différents suivant les peuples. Après avoir résumé le déroulement de ces processus pour l'ensemble de ces peuples, le présent article s'arrête sur un épisode, limité mais significatif, de cette histoire : l'utilisation, pendant une partie des années 1930, pour la plupart de ces langues, de l'alphabet latin, dans le cadre d'un mouvement plus large concernant l'ensemble des nationalité de l'URSS. Après avoir étudié les origines de ce mouvement et sa raison d'être, nous analyserons ses répercussions sur l'ensemble des langues finno-ougriennes de Russie, langues de la Volga, langues fenniques, komi et langues des peuples Nord. Enfin nous nous arrêterons sur le bilan de ces transformations et sur le passage à l'alphabet cyrillique, qui s'est réalisé pour toutes les langues à la fin de la décennie, ses raisons et ses modalités.
The Ugric languages Mansi, Khanty and Hungarian form a branch of the Uralic language family which is mainly spread across North-Eastern Europe and Siberia. Other prominent languages of the Uralic family are e.g. Finnish, Saami and Estonian. The Ob-Ugric languages Mansi and Khanty are spoken in Western Siberia along the Ob' river and its tributaries, thus they are referred to as Ob-Ugric. Their closest relative is Hungarian, spoken in Hungary and its neighboring countries. The status of the Khanty and Mansi languages is endangered: only 20% out of 8,000 ethnic Mansi and 30% out of 22,000 ethnic Khanty still speak their mother tongue, and there are nearly no monolingual speakers. In contrast, Hungarian is an official language of the European Union, spoken by about 15 million people. Hence, the status of literacy, language documentation and language education differs noticeably between Ob-Ugric and Hungarian. From a typological point of view, the Ugric languages are basically so-called SOV languages, their morphology is mainly agglutinative, i.e. grammatical information is rather encoded with suffixes which are attached to the stem instead of using prepositions, pronouns or articles. The most accessible referent in a discourse is not overtly realized on the surface of the sentence. Its position remains empty (zero-anaphora). This is also revealed in rich paradigms of personal suffixes which are used instead. One set of personal suffixes is attached to nominal stems and called possessive suffixes. They are involved in the structure of so-called attributive possessive constructions in most Uralic languages. As revealed in their denomination, research on possessive suffixes in Ugric languages, as in most Uralic languages, has primarily viewed them in the light of their function as markers of possessive relations, traditionally referred to as their prototypical use. The linguistic concept of possession seems to be universal. The notion of possession itself, though, is purely abstract and can only be understood as a »broader concept of association or relationship between two nouns«. While the definition is an abstract collective term, there is a broad consensus among linguists that certain prototypical meanings are covered by the concept of possession. These are: part-whole relations, kinship relations (both by blood and marriage), ownership relations as well as a fourth column covering all kinds of association in general (e.g. attribution, properties or orientation/location). The use of attributive possessive constructions is very frequent in most Uralic languages and, in a considerable amount of cases, a possessive reading of the relation is excluded, even in the most abstract interpretation of possession. Such cases, where the so-called prototypical use of possessive suffixes (i.e. denoting a possessive relation) fails to serve as an explanation, are frequently subsumed under the node of non-prototypical use and a secondary, non-possessive function is attributed to possessive suffixes. This secondary function is for instance likened to the properties of a definite article.
Open-source analyzer dictionary development is being implemented for Skolt Sami, Ingrian, Moksha-Mordvin, etc. in the Helsinki CSC infrastructure; home of the Finnish Kielipankki 'Language Bank' and Termipankki 'Term Bank'. The proximity of minority-language corpora in need of annotation and the multiple usage of controlled wikimedia-type dictionaries make CSC an attractive site for synchronized transducer dictionary development. The open-source FST develop- ment of Uralic and other minority languages at Giellatekno-Divvun in Tromsø demonstrates a vast potential for reusage of FST-s, only augmented by open- source work in OmorFi, Apertium and Universal Dependency . The initial idea is to allow synchronized editing of Giellatekno xml and CSC wiki structures via github. In addition to allowing for simple lexc LEMMA:STEM CONTINUATION_LEXICON "TRANS- LATION" ; line exports, the parallel dictionaries will provide for documentation of derivation, morpho-syntactic information on valency and government, seman- tics and etymology. ; Peer reviewed
This paper will provide a brief description of Skolt Sami and how it might be construed as a pluricentric language. Historical factors are identified that might contribute to a pluricentric identity: geographic location and political history; shortages of language documentation, and the establishment of a normative body for the development of a standard language. Skolt Sami is assessed in the context of Sami languages and is forwarded as one of a closely related yet distinct language group. Here the issue then becomes one of facilitating diversity even for under-documented languages. And we aptly describe opportunities in language technology that have been utilized to this end. Finally, brief insight is given for other Uralic languages with regard to pluricentric character and possibilities for language users to facilitate the maintenance of their individual language needs. ; Peer reviewed
На основе впервые вводимых в научный оборот архивных документов рассмотрены практики из повседневной жизни партийной организации Свердловского обллита. Проанализированы социально-бытовые условия жизни уральских цензоров, их потребности и ожидания от власти. Автор утверждает, что не прекращавшееся обновление кадрового состава цензурного ведомства во второй половине 1930-х гг. можно объяснить так называемым человеческим фактором. ; The study was conducted in the format is gaining popularity in domestic science areas stories of everyday life. To unique archival materials examined practices of everyday life of the party organization of the Ural censors. The study showed that the daily work of the censors was tense, and domestic issues made the psychological climate in the team unsustainable. Power structure does not support the party organizations of workers censorship. Treatment of Censors to the leaders of Party committees and government organizations do not give a positive effect. Financial difficulties prevented the coordinated work of the apparatus of censorship. Since wages censor only enough to rent a house and its contents has already been committed to do. No benefits Ural censors was not provided in the 1930's or early 1940. In most cases, the party "part" of censorship policy remains formal. The article noted that the party committees is not clearly presented the problem of political censorship. Party committees at times even wedged into the work of the preliminary control and interfered with the censors to perform their duties. Inattention to the needs of government censors oppressed and offended their feelings, and most importantly hope for the introduction to the benefits of a privileged stratum of society. Do not give moral satisfaction from the status of the profession, is constantly experiencing financial hardship, a good specialist left the censorship authority. The study concluded that no stops updating the staff of the Uralic obllitov apparatus in the second half of 1930 nach.1940's. can be attributed to the so-called "human factor". Meeting the daily needs were still the primary, rather than professional interests. It should be established that outside this research there are some unsolved problems. The question about the Censorship control of press in Ural demands the separate complex study and analyze personal passions of sensors in the process of their work. The research of the problem of relationships of Ural writers, artists and others towards the activity of the censorship department and concrete censors.
Languages borrow words when there is a need for it, all languages contain loanwords, and no part of the lexicon is entirely "loan-proof". These are statements about lexical borrowing that are typically found in linguistic textbooks (Hock and Joseph 1996). Further, we know that that there are large discrepancies in the borrowability of different lexical concepts, where core vocabulary domains (sense perception, spatial relations, the body, kinship, and motion) in general are more resistant to borrowing, whereas culture-dependent domains (religion and belief, clothing and grooming, the house, law, social and political relations, agriculture and vegetation) belong to a more loan-intense part of the lexicon. We are also aware that there are large differences in borrowability between languages, something that has multiple connotations, including language history, populations size, language contact, grammatical structure, and so forth (Haspelmath and Tadmor 2009).Our study aims at investigating borrowability more carefully from a historical perspective, using quantitative and statistical methods, focusing on the families of Indo-European, Nakh-Dagestanian, Northwest Caucasian , Kartvelian, Uralic, and Turkic. We use a lexical dataset, which contains 100 lexical meanings each from the domains of basic vocabulary (Swadesh) and culture vocabulary (domains of agriculture, vegetation, food, warfare, hunting, animals, technology), compiled from around 250 languages of the previously mentioned families (around 25,000 lexemes in total). The lexemes of the dataset have been coded (manually, from dictionaries) for cognacy according to a tree model and are distinguished by borrowability versus inheritance at every historical stage in the tree structure. The dataset also systematically includes ancient language forms (including reconstructed ones), and traces continued development of cognates, involving further semantic change. Connected to the dataset, there is language metadata that includes geographic extension, time period, relative population size and family tree topology. Taken together, this makes the dataset a unique and yet unexplored source for investigating large-scale borrowability statistically. In our study, we are specifically aiming the following research questions:•What is the general level of borrowability in our vocabulary (culture words), compared to the average borrowability of the same lexical meaning concluded by the cross-linguistic study of (Haspelmath and Tadmor 2009)?•In our data, are there any internal differences between the borrowability of lexical concepts, depending on semantic domain?•Is there a general connection between loanword directionality and population size of languages?•Is the amount of borrowing generally equal over time (gradually increasing with increasing amounts of documentation), or are some periods and geographic areas more intense in borrowing?•What happens to lexemes upon borrowing? Do they continue to change with the language or are they more likely to be frozen, semantically and/or morphologically? Haspelmath, M. and U. Tadmor (2009). Loanwords in the world's languages: a comparative handbook. Berlin, Mouton de Gruyter.Hock, H. H. and B. D. Joseph (1996). Language history, language change, and language relationship : an introduction to historical and comparative linguistics. Berlin, Mouton de Gruyter.
Preface RDHum 2019, the Research Data and Humanities Conference, takes place August 14–16, 2019 at the University of Oulu, Finland. RDHum 2019 is jointly organised by the University of Oulu and the University of Jyväskylä, in collaboration with FIN-CLARIN and The Language Bank of Finland. The event is the first in the series of conferences taking place biennially in one of the universities within the FIN-CLARIN Consortium. The first RDHum Conference is hosted by the University of Oulu, where the Oulu Corpus, a comprehensive and widely used digital research resource at the time, was collected and compiled in a project led by professor Pauli Saukkonen 50 years ago. Digital resources and technology are used more and more within the humanities and the social sciences. Researchers in digital humanities gather, administer, share and study rapidly accumulating digital resources. They also need various research methods and tools in analysing these resources. The conference Research Data and Humanities gathers researchers around these themes, and the scientific program of the Conference includes numerous topics related to digital data, digital methods and analysis in the Humanities. In this first Conference, the subjects of the presentations, posters and workshops come from several disciplines, such as linguistics, literary studies, computer science and information science. Thus the languages and societal phenomena under study, data and methods vary widely in the conference. The peer reviewed articles published in these proceedings are grouped into three categories according to their main focus: data, methods and tools. New data and corpora are presented in the following papers: Kurki et al. present Digilang, a joint venture to combine six different digital corpora. The corpora represent different kinds of data in various modalities. Ijaz seeks to determine editions analytically from bibliographic metadata. Lahti et al. describe the use of bibliograpghic data science in the study of bibliographic metadata collections. Pääkkönen presents challenges the end-user face with digital presentation systems and discusses the issues relating to metadata. Salonen et al. describe the collection and process of establishing the Corpus of Finlands Sign Language. They a lso discuss the storage, metadata and publication of the corpus. Jauhiainen presents Wanca in Korp, a sentence corpus for under-resourced Uralic languages and the process how the corpus was collected. New methods for digital humanities are presented in the following papers: Laippala in her paper discusses how to classify texts collected from the internet by means of automatic identification. Ryynänen and Hyyryläinen analyze the concept of Digital Humanities and propose a concept of "practical digital humanities" for describing research utilising a humanist approach to practical problem solving with digital technology development in the digital humanities context. Mikhailov compares texts by their frequency lists. He uses two different types of frequency word lists, unlemmatized and lemmatized, to conduct an experiment with. He observes the different outcomes of the two lists in the experiment. Ivaska presents an analysis of machine learning to identifying translated and non-translated Finnish texts and how to identify the source language of the translated text. Drobac and Linden discuss the issues relating to optical character recognition (OCR) in historical newspaper and journal text and assert that font families need to be recognized. They present an experiment relating to recognizing text in two different fonts. Cohrs and Petersen propose experimental methods of guessing a persons political party based on his tweets. Ijaz presents possibilities of analytical determination of editions from bibliographic metadata. Pääkkönen, Kettunen and Kervinen discuss findings made from user observations in searching digitized serial publications The following papers introduce new tools in digital humanities: Kettunen presents an analysis of semantic annotation of texts in the context of other automated tools for analyzing languages. He introduces a new tool, FiST, that has been developed to annotate semantically texts in Finnish. Huttunen describes digital games in reinforcing linguistic and socioemotional skills of children with communicative disabilities. She describes the properties of two versions of the game Tunne-etsivät and collection of research data from the users of the game. We gratefully acknowledge the financial and technical support from The Federation of Finnish Learned Societies, FIN-CLARIN consortium, The Language Bank of Finland, and the Universities of Oulu and Jyväskylä, and the city of Oulu, which made this event possible. We would also like to thank all the members of the FIN-CLARIN steering group, the members scientific and organising committees and the local students in the University of Oulu who encouraged to organize this event and worked hard to make this conference a reality. Finally we wish to thank all reviewers for their work and the Faculty of Humanities for agreeing to publish proceedings in Studia Humaniora Ouluensia. Jarmo Harri Jantunen (chair), Sisko Brunni, Niina Kunnas, Santeri Palviainen and Katja Västi ; Table of contents Preface Table of contents I Data Analytical determination of editions from bibliographic metadata. Ali Zeeshan Ijaz, Mikko Tolonen, Leo Lahti and Iiro Tiihonen Wanca in Korp: Text corpora for underresourced Uralic languages. Heidi Jauhiainen, Tommi Jauhiainen and Krister Lindén Digilang – Turun yliopiston digitaalisia kieliaineistoja kehittämässä. Tommi Kurki, Nobufumi Inaba, Annekatrin Kaivapalu, Maarit Koponen, Veronika Laippala, Christophe Leblay, Jorma Luutonen, Maarit Mutta, Markku Nikulin ja Elisa Reunanen Best Practices in Bibliographic Data Science. Leo Lahti, Ville Vaara, Jani Marjanen and Mikko Tolonen Digital heritage presentation system development + new material types: early findings. Tuula Pääkkönen Suomen viittomakielten korpusta rakentamassa. Juhana Salonen, Anna Puupponen, Ritva Takkinen ja Tommi Jantunen II Methods Guessing a tweet author's political party using weighted n-gram models. Enum Cohrs and Wiebke Petersen Optical font family recognition using a neural network. Senka Drobac and Krister Lindén Distinguishing translations from non-translations and identifying (in)direct translations' source languages. Laura Ivaska From bits and numbers to explanations – doing research on Internet-based big data. Veronika Laippala The Extent of Similarity: comparing texts by their frequency lists. Mikhail Mikhailov Search options used in digitized serial publications – observational user data and future challenges. Tuula Pääkkönen, Kimmo Kettunen and Jukka Kervinen Border crossing and trespassing? Expanding digital humanities research to developing peripheries with the novel digital technologies. Toni Ryynänen and Torsti Hyyryläinen III Tools Tutkimusaineiston kerääminen ja analysointi monipuolisia digitaalisia keinoja hyödyntäen. Esimerkkinä Tunne-etsivät-tutkimushankekokonaisuus. Kerttu Huttunen Kirjoitetun nykysuomen automaattisesta semanttisesta merkitsemisestä. Kimmo Kettunen