Researcher & Builder & Founder·PhD in Artificial Intelligence
Human-AI Interaction & Dynamic Software

What I work on
Rethinking the interaction layer between humans and agentic systems. The SaC paradigm proposes that AI should generate live, evolving UIs as agentic applications rather than static textual responses.
A new class of software where the frontend is generated on-demand and evolves through interaction — not shipped as a fixed pre-built. The backend is an agent system, not a static codebase.
Autonomous agents that navigate GUIs to complete tasks across apps and platforms — combining visual grounding, cross-app action execution, and non-intrusive automation without code instrumentation.
Intelligently parse UI semantics from raw visual inputs and transform static artifacts into live outputs — element detection, layout understanding, design-to-code, and form digitization.
Featured projects

A new human-agent interaction paradigm through generative, on-demand, evolving applications.
Autonomous agent to explore any given software & provide non-intrusive user assistance & automations. YC China 2025.
Mobile AI agent layer to understand apps and assist the user with app-intent mapping, step-by-step function guidance, and task automation.

Three-year exploration and research on software understanding and automation even before the agent era.

Unsupervised vision-based approach to analyze the spatial and semantic relations for the GUI elements and blocks.

Identify the target UI component in any app by user saying what they want in natural language, and use a Robot Arm to interact with the device automatically — a physical GUI agent before the agent era.
Experience & Education
IndependentRemote
FellouHybrid
PortalXSydney, Australia
CSIRO's Data61Sydney, Australia
CSIRO's Data61Sydney, Australia
TF-AMDPenang, Malaysia
Australian National UniversityCanberra, Australia
Australian National UniversityCanberra, Australia
Nanjing University of Science and TechnologyJiangsu, China
All Public Projects

A new human-agent interaction paradigm through generative, on-demand, evolving applications.

Co-founded at Fellou — the world's first agentic browser. Raised over $30M in funding.

Autonomous agent to explore any given software & provide non-intrusive user assistance & automations. YC China 2025.


Mobile AI agent layer to understand apps and assist the user with app-intent mapping, step-by-step function guidance, and task automation.

Record once, replay anywhere. NiCro captures user actions on one device and re-executes them across iOS and Android at any screen size — purely vision-based, zero source code access required.


Three-year exploration and research on software understanding and automation even before the agent era.

Unsupervised vision-based approach to analyze the spatial and semantic relations for the GUI elements and blocks.


What if AR glasses could understand the real world, not just overlay virtual objects onto it? This project integrates natural object detection with AR wearables — making the environment itself machine-readable.


Training-free real-time palm region detection and feature extraction from hand images — using classical image processing to achieve ~18fps without any annotated data or deep learning overhead.


Identify the target UI component in any app by user saying what they want in natural language, and use a Robot Arm to interact with the device automatically — a physical GUI agent before the agent era.

Humans don't see isolated buttons and labels — they see cards, lists, menus, and tabs. This project applies Gestalt psychology to automatically segment any GUI into perceptual layout blocks, the way a human would.

Snap a photo of any paper form — ezForm converts it into a fully interactive web form automatically, using computer vision to recognise every field, checkbox, and layout structure.

Upload a UI screenshot, get back modular HTML, CSS, and React code — EasyD2C uses computer vision to reverse-engineer design images into structured, maintainable front-end code.


Unsupervised detection of UI elements from any GUI screenshot — no training data, no labels. UIED combines Google OCR for text and a CV+CNN pipeline for non-text elements, handling both mobile and desktop UIs.

Computer vision-based reverse engineering of UI design — automatically converts a GUI screenshot or mockup into working UI code and a structured element tree, bridging the designer-to-developer gap.


Detect and report land-use changes — vegetation growth, new construction, land clearing — by contrasting satellite images of the same region across time periods using computer vision.

Search across massive unstructured document databases — txt, PDF, Word — using keyword queries. Built on ElasticSearch with a user-friendly web interface.


Detect physical targets in natural environments and read the digital numbers on them — instructing an unmanned aerial vehicle to execute corresponding actions autonomously.


Detect coloured target regions in dynamic natural environments to instruct the UAV to complete actions — robust to lighting variation, motion blur, and complex backgrounds.