CV
Academic CV. The PDF download icon at top right links to the latest LaTeX-rendered version (compact, optimized for print and recruiter PDF screens).
Contact Information
| Name | Shang-Jui Ray Kuo |
| Professional Title | PhD Student in Computer Science, Stony Brook University |
| raykuo.sj@gmail.com | |
| Phone | +1-934-246-1740 |
Professional Summary
PhD researcher at Stony Brook University’s SPELL Lab (advised by Prof. Paola Cascante-Bonilla), working on vision-language models. I re-examine the architectural defaults AI systems have inherited and built on — currently on both sides of the VLM interface (state space models as vision encoders; learned alternatives to tokenization on the language side).
Experience
-
2025 - Present Stony Brook, NY
Graduate Researcher
Stony Brook University — SPELL Lab
Vision-language research under Prof. Paola Cascante-Bonilla.
- Designed a strictly controlled LLaVA-style backbone-swap (frozen vision tower, fixed Vicuna-7B + 2-MLP connector, matched ImageNet-1K initialization) showing VMamba (pure SSM, ~30–89M params) leads ViT-family backbones up to ~662M params on RefCOCO / RefCOCO+ / RefCOCOg while remaining competitive on open-ended VQA.
- Showed ImageNet accuracy and naive backbone scaling are unreliable predictors of downstream VLM performance.
- Diagnosed ‘localization collapse’ in some detection-pretrained checkpoints as a vision–language interface failure (not architectural); proposed simple stabilizations (3-MLP connector + square input geometry) recovering collapsed localization to near-baseline.
-
2025 - 2025 Stony Brook, NY
Research Assistant
Stony Brook University — COMPAS Lab
ML-ISA hardware-portability bring-up under Prof. Mike Ferdman and Prof. Peter Milder.
- Retargeted an existing PyTorch-to-NPU pipeline (custom RTL NPU executing the lab’s ML-ISA framework natively) to AMD AI Engine cores on the VCK5000 FPGA via runtime-only modifications — demonstrating the framework’s hardware-portability claim across heterogeneous accelerators.
- Invented an RTL-level on-chip debug methodology with configurable debug registers under broken-toolchain conditions (no working hardware simulator; 8-hour bitstream compile cycles); first working matmul kernel on AIE in the lab.
-
2023 - 2024 Taipei, Taiwan
AI Researcher & AI Accelerator Engineer
Inventec Corporation — AI Center & Digital Center (AI on Chip)
NPU IP commercialization and applied AI deployment.
- Contributed to a commercial NPU IP product (VectorMesh™) during initial commercialization: designed RTL image-resize DSP module + built NPU co-simulation and verification environment. Product later won the EETimes 2024 Asia Golden Award for Best IP/Processor (after my departure).
- Designed a lightweight CNN (10–50K params) for stylus trajectory regression from capacitive sensors, achieving sub-2-pixel accuracy and ~50% error reduction; deployed in panel IC of a top IC design house.
- Co-authored the ICASSP 2024 paper on cross-domain augmentation for foot-ulcer segmentation.
Research Interests
- Architectural defaults at the input boundaries of modern AI systems. Vision encoders for VLMs (state space models vs Vision Transformers); tokenization (learned alternatives for writing systems the standard pipeline was never designed for); a hardware-and-systems perspective on how AI systems process the modalities they take in.
Education
-
2024 - Present Stony Brook, NY
Ph.D.
Stony Brook University
Computer Science
- CSE 537 Artificial Intelligence; Topics in CS: Vision-Language Foundation Models; CSE 538 Natural Language Processing
- Advisor: Prof. Paola Cascante-Bonilla, SPELL Lab (Synthetic Perception and Learning Lab)
- Expected completion: Spring 2029
-
2019 - 2023 Taipei, Taiwan
B.S.
National Taiwan University
Electrical Engineering
- Deep Learning for Computer Vision*, Machine Learning*, Digital Signal Processing in VLSI Design*, Digital System Design, Integrated Circuit Design, Computer Architecture, Algorithms, Data Structures (* graduate-level)
Projects
-
Toward Tokenizer-Free Chinese Language Models
Originated as CSE 538 NLP final project (Spring 2026), continuing as a PhD research direction. Controlled comparison of three Chinese character-embedding families (char-ID lookup, IDS radical decomposition, rendered-glyph vision encoders) under a fixed ModernBERT-small backbone. First milestone of a longer thread reframing tokenization as learned token construction.
- Structure-aware variants match the char-ID baseline on C-MTEB (31 frozen tasks).
- Introduced a low-cost frozen-encoder pixel-reconstruction probe for OOV-character behavior.
-
Adaptive Chess Tutor: LLM Dual-LoRA Fine-tuning
Course project (CSE 537, Spring 2025). TinyLlama-1.1B as a locally-deployable chess tutor combining move prediction and human-like explanation via dual-LoRA: programmatic SFT (60K Lichess samples via python-chess verifier) for chess foundations + cognitive distillation from a Mixtral-8x7B teacher (10K samples) into a second LoRA layered on a frozen Phase 1 adapter.
- Runtime α/β weighted-adapter blending for tunable trade-off between move quality (Stockfish Score Delta) and explanation quality (BERTScore F1).
- Instructor flagged the project for possible conference submission.
Publications
-
2026 Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encoders
Under Review (Featured on Hugging Face Daily Papers)
Controlled LLaVA-style backbone-swap showing VMamba (pure SSM) matches/beats much larger ViT-family encoders on grounding while remaining competitive on open-ended VQA. Featured on HF Daily Papers; poster accepted to SUNY AI Symposium 2026.
-
2024 Improving Limited Supervised Foot Ulcer Segmentation Using Cross-Domain Augmentation Strategies
IEEE ICASSP 2024
Two-stage cross-domain augmentation (TransMix) for limited-supervision foot-ulcer segmentation. * equal contribution (Kuo and Huang).
Awards
-
2023 1st Place + Best Presentation
Inventec Corporation
300+ participants, 74 teams. Cross-disciplinary partnership with an electromagnetics PhD on AI-powered laptop-antenna sensing/beamforming with zero hardware modifications; led the AI side.
Talks
- Apr 2026 — Poster, Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encoders. SUNY AI Symposium 2026 (SUNY-wide, externally reviewed).
- May 2026 — Poster, Glyph-Aware Text Embedding for Chinese LMs. CSE 538 NLP poster session, Stony Brook University.
- May 2025 — Poster, Adaptive Chess Tutoring. CSE 537 AI poster session, Stony Brook University.
- Fall 2025 — Course presentation, FlashAttention-1. Topics in CS: Vision-Language Foundation Models seminar, Stony Brook University.
- Dec 2023 — Guest lecture, Embedded Deep Neural Network Processing. National Taiwan University of Science and Technology (host — Prof. Sheng-Chang Juan); Ministry of Education Smart Chip Systems and Applications Course.
Teaching
- Fall 2024 — Teaching Assistant. CSE 582: Data Structures and Algorithms, Stony Brook University, Department of Computer Science.
- Spring 2025 — Teaching Assistant. CSE 320: System Fundamentals II, Stony Brook University, Department of Computer Science.
- Dec 2023 — Guest Lecturer. Embedded Deep Neural Network Processing, National Taiwan University of Science and Technology. One of three Inventec instructors jointly credentialed for the 12-hour course; taught the Dec 12 session (3 hours). Material drew on Inventec AI-on-Chip team work.
Service
- 2023 — Journal Reviewer. Computers & Graphics (Elsevier); ACM Transactions on Multimedia Computing, Communications and Applications (TOMM).
Skills
Languages
References
- Available upon request
Reference contacts available on request.