Computational Linguistics Lab

Computational Linguistics Lab

Research Focus

Founded in 2023, our Computational Linguistics Laboratory at Boğaziçi University focuses on natural language processing, deep learning, and computational social science. We develop advanced models and tools to analyze language, from historical texts to modern applications, bridging technology and linguistics.

News

  • Our recent work titled "Automated Screening of Antibacterial Nanoparticle Literature: Dataset Curation and Model Evaluation" co-authored with Alperen Öztürk, E. Sümeyra Turali-Emre, and collegues from University of Michigan has been accepted to EACL 2026 Main Conference! You can access the paper here.
  • Dilara Zeynep Gürer has been awarded the Best Thesis Award at NAACL SRW 2025 for her groundbreaking thesis proposal titled "Text Extraction and Script Completion in Images of Arabic Script-Based Calligraphy". Congratulations to Dilara! You can access the paper here.
  • Ece Elif Adak won first place in SMM4H-HeaRD 2025 Shared Task 3, successfully detecting dementia family caregivers on Twitter. Awesome work, Elif! You can access the paper here.
  • Dilara Zeynep Gürer's paper, based on her thesis research, has been accepted to the NAACL 2025 Student Research Workshop! Congratulations, Dilara!
  • Efe Eren Genç has been accepted into the MS program at Saarland University! Wishing you the best on this new journey, Efe!
  • We’ve launched our Hugging Face page, featuring various models and resources for historical Turkish NLP. Check it out, and for more details, read the preprint version of our paper!

Academics

  • Şaziye Betül Özateş Head of BUCOLIN LAB
  • Ümit Atlamaz Co-Supervisor
  • Ercan Atam Co-Supervisor

PhD Students

  • Dilara Zeynep Gürer Advisors: Ümit Atlamaz, Şaziye Betül Özateş
  • Nureddin Cüneyd Ünal Advisor: Şaziye Betül Özateş

Master's Students

  • Ece Elif Adak Advisor: Şaziye Betül Özateş
  • Tugay Balatlı Advisors: Ercan Atam, Şaziye Betül Özateş
  • Necip Fazıl Ergün Advisor: Şaziye Betül Özateş
  • Kutsal Gündüz Advisor: Şaziye Betül Özateş
  • Zeynep Karaman Advisors: Ercan Atam, Şaziye Betül Özateş
  • Enes Köser Advisor: Şaziye Betül Özateş
  • Elifnur Polat Advisors: Şaziye Betül Özateş, Ercan Atam
  • Mustafa Alperen Öztürk Advisor: Şaziye Betül Özateş
  • Kevser Taştan Advisor: Şaziye Betül Özateş
  • Şükrü Onur Yiğit Advisor: Şaziye Betül Özateş

Bachelor's Students

  • Tarık Emre Tıraş Dept. of Linguistics
  • Ada Cengiz Dept. of Linguistics

High School Students

  • Ece Yurtseven Robert College
  • Baki Berkay Altunkaynak Huseyin Avni Sozen Anatolian High School (IB Programme)

Ongoing Projects

  • Continual Pre-training of Large Language Models for Historical Language Understanding (2025 - )
  • Deep Learning-Based Extraction of Antibacterial Nanoparticle Activity in Nanomedicine Literature - with E. S. Turalı-Emre
  • Automatic Processing and Analysis of Kazasker Ruznamçe Records with Digital Methods
    (TÜBİTAK 3005, 2024 - ) - with E. F. B. Taşdemir
  • Analysis of Calligraphy Artworks Using Artificial Intelligence: Understanding Cultural Heritage and Accessibility (TÜBİTAK 3005, 2026 - ) - with D. Z. Gürer

Completed Projects

  • Deep Learning-based Exploration of Linguistic Structures and Semantic Entities in Historical Turkish Texts
    (BAP, 2024-2026)
  • Dilara Zeynep Gürer, Esma F. Bilgin Taşdemir, and Şaziye Betül Özateş. 2026. RuznamceNER: Named Entity Recognition Dataset for Ottoman Turkish. LREC 2026.
  • Tarık Emre Tıraş, N. Cüneyd Ünal, Ada Cengiz, Ece Yurtseven, Esma F. Bilgin Taşdemir, and Şaziye Betül Özateş. 2026. OTA-BOUN: A Historical Turkish Dependency Treebank. LREC 2026.
  • Development of a Deep Learning-Based Data Expansion Tool for Predicting the Efficacy of Antibacterial Nanoparticles Used in Nanomedicine (TÜBİTAK BİLGEM, 2024-2025) - with E. S. Turalı-Emre
  • Alperen Ozturk, Şaziye Betül Özateş, Sophia Bahar Root, Angela Violi, Nicholas Kotov, J. Scott VanEpps, and Emine Sumeyra Turali Emre. 2026. Automated Screening of Antibacterial Nanoparticle Literature: Dataset Curation and Model Evaluation. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 454–465, Rabat, Morocco.
  • Building Natural Language Processing Resources for Ottoman Turkish (2023 - 2025)
  • Şaziye Betül Özateş, Tarık Emre Tıraş, Ece Elif Adak, Berat Doğan, Fatih Burak Karagöz, Efe Eren Genç, and Esma F. Bilgin Taşdemir. 2025. Building Foundations for Natural Language Processing of Historical Turkish: Resources and Models. arXiv preprint arXiv:2501.04828.
  • Şaziye Betül Özateş, Tarık Emre Tıraş, Efe Eren Genç, and Esma F. Bilgin Taşdemir. 2024. Dependency Annotation of Ottoman Turkish with Multilingual BERT. In Proceedings of The 18th Linguistic Annotation Workshop (LAW-XVIII), pages 188–196, St. Julians, Malta. Association for Computational Linguistics.
  • Fatih Burak Karagöz, Berat Doğan, and Şaziye Betül Özateş. 2024. Towards a Clean Text Corpus for Ottoman Turkish. In Proceedings of the First Workshop on Natural Language Processing for Turkic Languages (SIGTURK 2024), accepted. Association for Computational Linguistics.