Roman Koshkin, Katsuhito Sudoh & Satoshi Nakamura (2024) LLMs Are Zero-Shot Context-Aware Simultaneous Translators arXiv :2406.13476.AbstractPDF
The advent of transformers has fueled progress in machine translation. More recently large language models (LLMs) have come to the spotlight thanks to their generality and strong performance in a wide range of language tasks, including translation. Here we show that open-source LLMs perform on par with or better than some state-of-the-art baselines in simultaneous machine translation (SiMT) tasks, zero-shot. We also demonstrate that injection of minimal background information, which is easy with an LLM, brings further performance gains, especially on challenging technical subject-matter. This highlights LLMs' potential for building next generation of massively multilingual, context-aware and terminologically accurate SiMT systems that require no resource-intensive training or fine-tuning.
Decoder-only large language models (LLMs) have recently demonstrated impressive capabilities in text generation and reasoning. Nonetheless, they have limited applications in simultaneous machine translation (SiMT), currently dominated by encoder-decoder transformers. This study demonstrates that, after fine-tuning on a small dataset comprising causally aligned source and target sentence pairs, a pre-trained open-source LLM can control input segmentation directly by generating a special 'wait' token. This obviates the need for a separate policy and enables the LLM to perform English-German and English-Russian SiMT tasks with BLEU scores that are comparable to those of specific state-of-the-art baselines. We also evaluated closed-source models such as GPT-4, which displayed encouraging results in performing the SiMT task without prior training (zero-shot), indicating a promising avenue for enhancing future SiMT systems.
Peer-reviewed Publications
Roman Koshkin, Tomoki Fukai (2024) convSeq: Fast and Scalable Method for Detecting Patterns in Spike Data. ICML :PMLR 235, 2024.AbstractPDF
Spontaneous neural activity, crucial in memory, learning, and spatial navigation, often manifests itself as repetitive spatiotemporal patterns. Despite their importance, analyzing these patterns in large neural recordings remains challenging due to a lack of efficient and scalable detection methods. Addressing this gap, we introduce convSeq, an unsupervised method that employs backpropagation for optimizing spatiotemporal filters that effectively identify these neural patterns. Our method's performance is validated on various synthetic data and real neural recordings, revealing spike sequences with unprecedented scalability and efficiency. Significantly surpassing existing methods in speed, convSeq sets a new standard for analyzing spontaneous neural activity, potentially advancing our understanding of information processing in neural circuits.
Roman Koshkin, Tomoki Fukai (2023) Unsupervised Detection of Cell Assemblies with Graph Neural Networks. ICLR :TPT.AbstractPDF
Cell assemblies, putative units of neural computation, manifest themselves as repeating and temporally coordinated activity of neurons. However, understanding of their role in brain function is hampered by a lack of scalable methods for their unsupervised detection. We propose using a graph neural network for embedding spike data into a sequence of fixed size vectors and clustering them based on their self-similarity across time. We validate our method on synthetic data and real neural recordings.
Roman Koshkin, Yury Shtyrov, Andriy Myachykov & Alex Ossadtchi (2020) Testing the Efforts Model of Simultaneous Interpreting PLoS ONE13(10):100-120.AbstractPDF
We utilized the event-related potential (ERP) technique to study neural activity associated with different levels of working memory (WM) load during simultaneous interpretation (SI) of continuous prose. The amplitude of N1 and P1 components elicited by task-irrelevant tone probes was significantly modulated as a function of WM load but not the direction of interpretation. Furthermore, the latency of the P1 increased significantly with WM load. The WM load effect on N1 latency, however, did not reach significance. Larger negativity under lower WM loads suggests that more attention is available to process the source message, providing the first electrophysiological evidence in support of the Efforts Model of SI. Relationships between the direction of interpretation and median WM load are also discussed.
Roman Koshkin & Alex Ossadtchi (2020) Commentary: Functional Connectivity in the Left Dorsal Stream Facilitates Simultaneous Language Translation: An EEG Study Frontiers in Human Neuroscience11(2):273.PDF
Patents
ruRoman Koshkin, Mariya Volodina, Alex Ossadtchi (2020) Electroencephalographic method and system of objective estimation of listenersā reaction to audio content based on a range of voluntary affective categories. RF Patent:2747571.AbstractPDF
The invention relates to marketing technologies, and allows, based on the psychophysiological state, measured using an electrocephalograph, to objectively assess the listener's reaction to audio content according to an arbitrary range of affective categories (such as interestĀ in, emotion to, agreement/disagreement with theĀ contentĀ playedĀ etc.) , when conducting opinion polls, focus groups, as well asĀ forĀ educational process. The technical result achieved consists in obtaining an objective psychophysiological metric that allows one to judge how carefully theĀ presentedĀ audio content was listened to, as well as its rating (in comparison with alternative audio content), and without the use of questionnaires, surveys or other methods prone to cognitiveĀ biases. The technical result is achieved due to the fact that the decision rule for decision treesĀ isĀ optimized based on the psychophysiological state, measured using an, recorded while a group of listenersĀ listen toĀ reference audio messages, selected from a pre-created databaseĀ ofĀ reference audio messagesĀ and their correspondingĀ affective ratingsĀ (being anĀ integral part of the database) collected in aĀ preliminary survey of a large group of respondents.
Conference Proceedings
Working Memory Load in Simultaneous Language Interpretation: An ERP Study
Roman Koshkin & Alex Ossadtchi
2017, Moscow, Russia
AbstractPDF
We utilized the event-related potential (ERP) technique to study neural activity associated with different levels of working memory (WM) load during simultaneous language interpretation (SLI). We pioneered the use of the technique on conference interpreters articulating overtly. The amplitude of the N1 component elicited by task-irrelevant tone probes was significantly modulated as a function of WM load but not the direction of interpretation. The N1 amplitude decreased with load, suggesting shallower processing under high WM load regardless of the direction.
Presentations and Talks
Testing One Aspect of the Efforts Model of Simultaneous Interpreting: An ERP Study
Roman Koshkin, Yury Shtyrov, & Alex Ossadtchi
October 27-29, 2017, Saint-Petersburg, Russia
AbstractPDF
Due to the inherent complexity of simultaneous interpreting (SI), testing its theoretical models in a simple, but ecologically valid experimental paradigm has been problematic. We attempted to overcome some of the associated challenges using a novel method for WM load estimation in conjunction with the event-related potentials (ERP) technique. Specifically, we tested the prediction of the Efforts Model of SI (Gile, 1988) that increased WM load impairs the processing of the source message during SI.Consistent with the model, the N1 amplitude that we used as an index of attention was significantly modulated as a function of WM load. Negativity in the N1 range decreased at higher levels of WM load, suggesting shallower processing of the source message under high WM load. Our findings represent the first electrophysiological evidence in support of the Efforts Model.
N1 ERP As an Index of Depth of Processing In Simultaneous Interpreting
Roman Koshkin, Yury Shtyrov, & Alex Ossadtchi
September 28-29, 2016, Moscow, Russia
AbstractPDF
Researchers have extensively studied simultaneous interpretation (SI), but understanding how the brain manages the limited resources of attention and working memory (WM) systems under such extreme conditions of language control has remained elusive. Here, we aim to use the event-related potentials (ERP) to investigate the interplay of attention and WM load during simultaneous interpretation of real speech in an ecologically valid overt production paradigm. Previous research using simple dichotic listening paradigms showed the N1 ERP component to be modulated by attention thus making it a suitable temporally precise index of attention. Specifically, we test the hypothesis that at larger WM loads, attention to the source message is markedly reduced as it gets redeployed towards processing the backlog of previous information. If this is borne out, translation fidelity is more likely to be compromised during periods of higher WM load. Additionally, we examine if concurrent articulation degrades attention to, and processing of, the source message in SI. Although such a setup requires non-trivial artifact correction techniques pushing the limits of the ERP method, the pattern of results based on one participantās data suggests an effect of WM load on the N1 amplitude , which is in agreement with our original hypothesis.
Working Memory Load In Simultaneous Language Interpretation: An ERP Study
Roman Koshkin, Alex Ossadtchi & Yury Shtyrov
June 26-30, 2017, Saint Petersburg, Russia
AbstractPDF
We utilized the event-related potential (ERP) technique to study neural activity associated with different levels of working memory (WM) load during simultaneous language interpretation (SLI). We pioneered the use of the technique on conference interpreters articulating overtly. The amplitude of the N1 component elicited by task-irrelevant tone probes was significantly modulated as a function of WM load but not the direction of interpretation. The N1 amplitude decreased with WM load suggesting shallower processing under high WM load regardless of the direction. Using our novel projection-based method we identified otherwise hidden WM load-dependent regularities in the P3 range. The results are discussed in terms of the Efforts Model of simultaneous language interpreting.
Localizing Hidden Regularities With Known Temporal Structure in the EEG Evoked Response Data
Alexandra Kuznetsova, Roman Koshkin, & Alex Ossadtchi
June 26-30, 2017, Saint Petersburg, Russia
AbstractPDF
We describe a novel data driven spatial filtering technique that can be applied to the evoked potentials in the EEG data in order to find statistically significant hidden differential activations, which cannot be found by standard single-channel analysis. The underlying optimization problem is formulated as a generalized Rayleigh quotient maximization problem. The technique is based on the known morphological characteristics of the response: the optimal filter maximizes the difference in the target interval when the component typically occurs and at the same time minimizes the difference in the flanker interval. The technique is equipped with a relevant randomization-based statistical test to assess the significance of the discovered phenomenon. The performance of the proposed method was evaluated with the simulated ERP data, the results are compared with the competing ICA-based method. Furthermore, we describe an application of the proposed method to the EEG data acquired in two studies: study devoted to the simultaneous language interpreting (group analysis) and analysis of the auditory neuroplasticity (single subject application). We show how the differential components can be detected after filtration and support our results with the permutation statistical test, topographies analysis and single-trial evidence.
Does High Working Memory Load Disrupt Listening in Simultaneous Interpreting?
Roman Koshkin, Yury Shtyrov, Alex Ossadtchi
April 27, 2017, Higher School of Economics, Moscow, Russia