Shang-Jui Ray Kuo
PhD Student, Computer Science · Stony Brook University · SPELL Lab
I am a PhD researcher at the SPELL Lab (Synthetic Perception and Learning Lab), Stony Brook University, advised by Prof. Paola Cascante-Bonilla. My research focuses on fundamental questions about the architectural choices AI systems have inherited and built on. Many of those choices were made when the practical constraints looked very different, and I work on rigorously comparing the defaults against the alternatives, whether those alternatives already exist or have to be designed for the comparison.
My current work runs concurrent threads on both sides of the vision-language interface. On the vision side, I ask whether Vision Transformers are actually the right backbone for VLMs, or whether this is a historical default worth revisiting. On the language side, I ask whether the standard tokenization pipeline is the right input interface, particularly for writing systems it was never designed for. Underlying both is a deeper question: are there more natural ways for AI systems to process the modalities they take in, ways that fit both the structure of the data and the modern computing systems that run on it? My background in hardware and systems shapes how I work at this boundary.
Before starting my PhD I was an AI researcher and AI accelerator engineer at Inventec Corporation in Taipei, working on medical image segmentation and NPU IP design. I received my B.S. in Electrical Engineering from National Taiwan University in 2023.
Recent. Our VLM-SSM vision-encoder paper was featured on Hugging Face Daily Papers; the HF open-source team offered a ZeroGPU (A100) grant for a public demo. The work was also accepted as a poster at the SUNY AI Symposium 2026.
Research interests. Vision-language models · state space models as vision encoders · tokenization and learned input representations · multimodal learning · hardware-AI co-design.