CV | Shang-Jui Ray Kuo

Contact Information

Name	Shang-Jui Ray Kuo
Professional Title	PhD Student in Computer Science, Stony Brook University
Email	raykuo.sj@gmail.com
Phone	+1-934-246-1740

Professional Summary

PhD researcher at Stony Brook University’s SPELL Lab (advised by Prof. Paola Cascante-Bonilla), working on vision-language models. I re-examine the architectural defaults AI systems have inherited and built on — currently on both sides of the VLM interface (state space models as vision encoders; learned alternatives to tokenization on the language side).

Experience

2025 - Present

Stony Brook, NY
Graduate Researcher

Stony Brook University — SPELL Lab

Vision-language research under Prof. Paola Cascante-Bonilla.
- Designed a strictly controlled LLaVA-style backbone-swap (frozen vision tower, fixed Vicuna-7B + 2-MLP connector, matched ImageNet-1K initialization) showing VMamba (pure SSM, ~30–89M params) leads ViT-family backbones up to ~662M params on RefCOCO / RefCOCO+ / RefCOCOg while remaining competitive on open-ended VQA.
- Showed ImageNet accuracy and naive backbone scaling are unreliable predictors of downstream VLM performance.
- Diagnosed ‘localization collapse’ in some detection-pretrained checkpoints as a vision–language interface failure (not architectural); proposed simple stabilizations (3-MLP connector + square input geometry) recovering collapsed localization to near-baseline.
2025 - 2025

Stony Brook, NY
Research Assistant

Stony Brook University — COMPAS Lab

ML-ISA hardware-portability bring-up under Prof. Mike Ferdman and Prof. Peter Milder.
- Retargeted an existing PyTorch-to-NPU pipeline (custom RTL NPU executing the lab’s ML-ISA framework natively) to AMD AI Engine cores on the VCK5000 FPGA via runtime-only modifications — demonstrating the framework’s hardware-portability claim across heterogeneous accelerators.
- Invented an RTL-level on-chip debug methodology with configurable debug registers under broken-toolchain conditions (no working hardware simulator; 8-hour bitstream compile cycles); first working matmul kernel on AIE in the lab.
2023 - 2024

Taipei, Taiwan
AI Researcher & AI Accelerator Engineer

Inventec Corporation — AI Center & Digital Center (AI on Chip)

NPU IP commercialization and applied AI deployment.
- Contributed to a commercial NPU IP product (VectorMesh™) during initial commercialization: designed RTL image-resize DSP module + built NPU co-simulation and verification environment. Product later won the EETimes 2024 Asia Golden Award for Best IP/Processor (after my departure).
- Designed a lightweight CNN (10–50K params) for stylus trajectory regression from capacitive sensors, achieving sub-2-pixel accuracy and ~50% error reduction; deployed in panel IC of a top IC design house.
- Co-authored the ICASSP 2024 paper on cross-domain augmentation for foot-ulcer segmentation.

Research Interests

Architectural defaults at the input boundaries of modern AI systems. Vision encoders for VLMs (state space models vs Vision Transformers); tokenization (learned alternatives for writing systems the standard pipeline was never designed for); a hardware-and-systems perspective on how AI systems process the modalities they take in.

Education

2024 - Present

Stony Brook, NY
Ph.D.

Stony Brook University

Computer Science
- CSE 537 Artificial Intelligence; Topics in CS: Vision-Language Foundation Models; CSE 538 Natural Language Processing
- Advisor: Prof. Paola Cascante-Bonilla, SPELL Lab (Synthetic Perception and Learning Lab)
- Expected completion: Spring 2029
2019 - 2023

Taipei, Taiwan
B.S.

National Taiwan University

Electrical Engineering
- Deep Learning for Computer Vision*, Machine Learning*, Digital Signal Processing in VLSI Design*, Digital System Design, Integrated Circuit Design, Computer Architecture, Algorithms, Data Structures (* graduate-level)

Projects

Toward Tokenizer-Free Chinese Language Models

Originated as CSE 538 NLP final project (Spring 2026), continuing as a PhD research direction. Controlled comparison of three Chinese character-embedding families (char-ID lookup, IDS radical decomposition, rendered-glyph vision encoders) under a fixed ModernBERT-small backbone. First milestone of a longer thread reframing tokenization as learned token construction.
- Structure-aware variants match the char-ID baseline on C-MTEB (31 frozen tasks).
- Introduced a low-cost frozen-encoder pixel-reconstruction probe for OOV-character behavior.
Adaptive Chess Tutor: LLM Dual-LoRA Fine-tuning

Course project (CSE 537, Spring 2025). TinyLlama-1.1B as a locally-deployable chess tutor combining move prediction and human-like explanation via dual-LoRA: programmatic SFT (60K Lichess samples via python-chess verifier) for chess foundations + cognitive distillation from a Mixtral-8x7B teacher (10K samples) into a second LoRA layered on a frozen Phase 1 adapter.
- Runtime α/β weighted-adapter blending for tunable trade-off between move quality (Stockfish Score Delta) and explanation quality (BERTScore F1).
- Instructor flagged the project for possible conference submission.

Publications

2026

Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encoders

Under Review (Featured on Hugging Face Daily Papers)

Controlled LLaVA-style backbone-swap showing VMamba (pure SSM) matches/beats much larger ViT-family encoders on grounding while remaining competitive on open-ended VQA. Featured on HF Daily Papers; poster accepted to SUNY AI Symposium 2026.
2024

Improving Limited Supervised Foot Ulcer Segmentation Using Cross-Domain Augmentation Strategies

IEEE ICASSP 2024

Two-stage cross-domain augmentation (TransMix) for limited-supervision foot-ulcer segmentation. * equal contribution (Kuo and Huang).

Awards

2023

1st Place + Best Presentation

Inventec Corporation

300+ participants, 74 teams. Cross-disciplinary partnership with an electromagnetics PhD on AI-powered laptop-antenna sensing/beamforming with zero hardware modifications; led the AI side.

Talks

Apr 2026 — Poster, Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encoders. SUNY AI Symposium 2026 (SUNY-wide, externally reviewed).

May 2026 — Poster, Glyph-Aware Text Embedding for Chinese LMs. CSE 538 NLP poster session, Stony Brook University.

May 2025 — Poster, Adaptive Chess Tutoring. CSE 537 AI poster session, Stony Brook University.

Fall 2025 — Course presentation, FlashAttention-1. Topics in CS: Vision-Language Foundation Models seminar, Stony Brook University.

Dec 2023 — Guest lecture, Embedded Deep Neural Network Processing. National Taiwan University of Science and Technology (host — Prof. Sheng-Chang Juan); Ministry of Education Smart Chip Systems and Applications Course.

Teaching

Fall 2024 — Teaching Assistant. CSE 582: Data Structures and Algorithms, Stony Brook University, Department of Computer Science.

Spring 2025 — Teaching Assistant. CSE 320: System Fundamentals II, Stony Brook University, Department of Computer Science.

Dec 2023 — Guest Lecturer. Embedded Deep Neural Network Processing, National Taiwan University of Science and Technology. One of three Inventec instructors jointly credentialed for the 12-hour course; taught the Dec 12 session (3 hours). Material drew on Inventec AI-on-Chip team work.

Service

2023 — Journal Reviewer. Computers & Graphics (Elsevier); ACM Transactions on Multimedia Computing, Communications and Applications (TOMM).

Skills

Languages (Master): Python, C++, Verilog / SystemVerilog

Machine learning (Master): PyTorch, Hugging Face (Transformers, PEFT, Datasets); State Space Models (Mamba family); LoRA / dual-branch LoRA; Knowledge distillation; MLM pretraining; Flash Attention; bf16 mixed-precision; H100/H200 cluster training (SLURM)

Hardware & systems (Master): RTL design; AMD VCK5000 (AI Engine bring-up); Vitis / Vivado; RISC-V; LLVM IR; Linux/bash, Git, Docker

Writing (Master): LaTeX

Languages

Mandarin Chinese : Native speaker

English : Professional working proficiency

References

Available upon request
Reference contacts available on request.

Contact Information

Professional Summary

Experience

Graduate Researcher

Stony Brook University — SPELL Lab

Vision-language research under Prof. Paola Cascante-Bonilla.

Research Assistant

Stony Brook University — COMPAS Lab

ML-ISA hardware-portability bring-up under Prof. Mike Ferdman and Prof. Peter Milder.

AI Researcher & AI Accelerator Engineer

Inventec Corporation — AI Center & Digital Center (AI on Chip)

NPU IP commercialization and applied AI deployment.

Research Interests

Education

Ph.D.

Stony Brook University

Computer Science

B.S.

National Taiwan University

Electrical Engineering

Projects

Toward Tokenizer-Free Chinese Language Models

Adaptive Chess Tutor: LLM Dual-LoRA Fine-tuning

Publications

Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encoders

Under Review (Featured on Hugging Face Daily Papers)

Improving Limited Supervised Foot Ulcer Segmentation Using Cross-Domain Augmentation Strategies

IEEE ICASSP 2024