CV

Academic CV. The PDF download icon at top right links to the latest LaTeX-rendered version (compact, optimized for print and recruiter PDF screens).

Contact Information

Name Shang-Jui Ray Kuo
Professional Title PhD Student in Computer Science, Stony Brook University
Email raykuo.sj@gmail.com
Phone +1-934-246-1740

Professional Summary

PhD researcher at Stony Brook University’s SPELL Lab (advised by Prof. Paola Cascante-Bonilla), working on vision-language models. I re-examine the architectural defaults AI systems have inherited and built on — currently on both sides of the VLM interface (state space models as vision encoders; learned alternatives to tokenization on the language side).

Experience

  • 2025 - Present

    Stony Brook, NY

    Graduate Researcher
    Stony Brook University — SPELL Lab
    Vision-language research under Prof. Paola Cascante-Bonilla.
    • Designed a strictly controlled LLaVA-style backbone-swap (frozen vision tower, fixed Vicuna-7B + 2-MLP connector, matched ImageNet-1K initialization) showing VMamba (pure SSM, ~30–89M params) leads ViT-family backbones up to ~662M params on RefCOCO / RefCOCO+ / RefCOCOg while remaining competitive on open-ended VQA.
    • Showed ImageNet accuracy and naive backbone scaling are unreliable predictors of downstream VLM performance.
    • Diagnosed ‘localization collapse’ in some detection-pretrained checkpoints as a vision–language interface failure (not architectural); proposed simple stabilizations (3-MLP connector + square input geometry) recovering collapsed localization to near-baseline.
  • 2025 - 2025

    Stony Brook, NY

    Research Assistant
    Stony Brook University — COMPAS Lab
    ML-ISA hardware-portability bring-up under Prof. Mike Ferdman and Prof. Peter Milder.
    • Retargeted an existing PyTorch-to-NPU pipeline (custom RTL NPU executing the lab’s ML-ISA framework natively) to AMD AI Engine cores on the VCK5000 FPGA via runtime-only modifications — demonstrating the framework’s hardware-portability claim across heterogeneous accelerators.
    • Invented an RTL-level on-chip debug methodology with configurable debug registers under broken-toolchain conditions (no working hardware simulator; 8-hour bitstream compile cycles); first working matmul kernel on AIE in the lab.
  • 2023 - 2024

    Taipei, Taiwan

    AI Researcher & AI Accelerator Engineer
    Inventec Corporation — AI Center & Digital Center (AI on Chip)
    NPU IP commercialization and applied AI deployment.
    • Contributed to a commercial NPU IP product (VectorMesh™) during initial commercialization: designed RTL image-resize DSP module + built NPU co-simulation and verification environment. Product later won the EETimes 2024 Asia Golden Award for Best IP/Processor (after my departure).
    • Designed a lightweight CNN (10–50K params) for stylus trajectory regression from capacitive sensors, achieving sub-2-pixel accuracy and ~50% error reduction; deployed in panel IC of a top IC design house.
    • Co-authored the ICASSP 2024 paper on cross-domain augmentation for foot-ulcer segmentation.

Research Interests

  • Architectural defaults at the input boundaries of modern AI systems. Vision encoders for VLMs (state space models vs Vision Transformers); tokenization (learned alternatives for writing systems the standard pipeline was never designed for); a hardware-and-systems perspective on how AI systems process the modalities they take in.

Education

  • 2024 - Present

    Stony Brook, NY

    Ph.D.
    Stony Brook University
    Computer Science
    • CSE 537 Artificial Intelligence; Topics in CS: Vision-Language Foundation Models; CSE 538 Natural Language Processing
    • Advisor: Prof. Paola Cascante-Bonilla, SPELL Lab (Synthetic Perception and Learning Lab)
    • Expected completion: Spring 2029
  • 2019 - 2023

    Taipei, Taiwan

    B.S.
    National Taiwan University
    Electrical Engineering
    • Deep Learning for Computer Vision*, Machine Learning*, Digital Signal Processing in VLSI Design*, Digital System Design, Integrated Circuit Design, Computer Architecture, Algorithms, Data Structures (* graduate-level)

Projects

  • Toward Tokenizer-Free Chinese Language Models

    Originated as CSE 538 NLP final project (Spring 2026), continuing as a PhD research direction. Controlled comparison of three Chinese character-embedding families (char-ID lookup, IDS radical decomposition, rendered-glyph vision encoders) under a fixed ModernBERT-small backbone. First milestone of a longer thread reframing tokenization as learned token construction.

    • Structure-aware variants match the char-ID baseline on C-MTEB (31 frozen tasks).
    • Introduced a low-cost frozen-encoder pixel-reconstruction probe for OOV-character behavior.
  • Adaptive Chess Tutor: LLM Dual-LoRA Fine-tuning

    Course project (CSE 537, Spring 2025). TinyLlama-1.1B as a locally-deployable chess tutor combining move prediction and human-like explanation via dual-LoRA: programmatic SFT (60K Lichess samples via python-chess verifier) for chess foundations + cognitive distillation from a Mixtral-8x7B teacher (10K samples) into a second LoRA layered on a frozen Phase 1 adapter.

    • Runtime α/β weighted-adapter blending for tunable trade-off between move quality (Stockfish Score Delta) and explanation quality (BERTScore F1).
    • Instructor flagged the project for possible conference submission.

Publications

Awards

  • 2023
    1st Place + Best Presentation
    Inventec Corporation

    300+ participants, 74 teams. Cross-disciplinary partnership with an electromagnetics PhD on AI-powered laptop-antenna sensing/beamforming with zero hardware modifications; led the AI side.

Talks

  • Apr 2026 — Poster, Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encoders. SUNY AI Symposium 2026 (SUNY-wide, externally reviewed).
  • May 2026 — Poster, Glyph-Aware Text Embedding for Chinese LMs. CSE 538 NLP poster session, Stony Brook University.
  • May 2025 — Poster, Adaptive Chess Tutoring. CSE 537 AI poster session, Stony Brook University.
  • Fall 2025 — Course presentation, FlashAttention-1. Topics in CS: Vision-Language Foundation Models seminar, Stony Brook University.
  • Dec 2023 — Guest lecture, Embedded Deep Neural Network Processing. National Taiwan University of Science and Technology (host — Prof. Sheng-Chang Juan); Ministry of Education Smart Chip Systems and Applications Course.

Teaching

  • Fall 2024 — Teaching Assistant. CSE 582: Data Structures and Algorithms, Stony Brook University, Department of Computer Science.
  • Spring 2025 — Teaching Assistant. CSE 320: System Fundamentals II, Stony Brook University, Department of Computer Science.
  • Dec 2023 — Guest Lecturer. Embedded Deep Neural Network Processing, National Taiwan University of Science and Technology. One of three Inventec instructors jointly credentialed for the 12-hour course; taught the Dec 12 session (3 hours). Material drew on Inventec AI-on-Chip team work.

Service

  • 2023 — Journal Reviewer. Computers & Graphics (Elsevier); ACM Transactions on Multimedia Computing, Communications and Applications (TOMM).

Skills

Languages (Master): Python, C++, Verilog / SystemVerilog
Machine learning (Master): PyTorch, Hugging Face (Transformers, PEFT, Datasets); State Space Models (Mamba family); LoRA / dual-branch LoRA; Knowledge distillation; MLM pretraining; Flash Attention; bf16 mixed-precision; H100/H200 cluster training (SLURM)
Hardware & systems (Master): RTL design; AMD VCK5000 (AI Engine bring-up); Vitis / Vivado; RISC-V; LLVM IR; Linux/bash, Git, Docker
Writing (Master): LaTeX

Languages

Mandarin Chinese : Native speaker
English : Professional working proficiency

References

  • Available upon request

    Reference contacts available on request.