Figure 4. Unique TDM functions for gold standard and corporation machine translation. Example of reading: For the French language, the number of overlapping characteristics is about 28,000, while the total number of characteristics is about 33,000 for machine-translated documents and about 38,000 for gold standard documents. When using Bag of Words templates, it is customary to pre-process the data to suppress noise. In our case, we removed punctuation, numbers, and general stop words, and all remaining words were written and deleted. The pre-processing steps, both for gold standard texts and for mechanically translated texts, are identical and have been applied to the translated texts. Footnote 11 To perform these pre-processing steps, we used Python and R libraries. We used regular expressions in Python and the NLTK package (Bird, Klein and Loper Reference Bird, Klein and Loper Reference Bird, Klein and Loper2009) for stemming, removing stop words, removing numbers, removing lowercase letters, and removing punctuations. To create the CTDs, we switched to R and the quanteda package (Benoit and Nulty Reference Benoit and Nulty2013).
Footnote 12 We compare the MDMs of gold standard documents and machine translation and also use them as inputs for the thematic models described below. Readers most interested in our analysis of the DDMs may decide to proceed to the next section, which contains more technical details on the specification of our thematic models. The objective of this study is to analyze the Polish and English clauses of copyright transfer agreements in order to detect the quality of Google Translate`s translation. The following research method was used: Matulewska, A. 2019. A presentation: Translation and translation of Polish legal texts into English at the 4th Conference Legal Translation, Judicial Interpretation and Comparative Legislation, 28-30 June 2019, Poznań. It is difficult for Google Translate to find and translate fossilized and archaic structures such as “here”, “beyond”, etc. They are translated as “from this contract”, “to this treaty”, etc. In this example, we also have grammatical deficiencies such as “reference in” or “transfer to indicate” (cf. .