Mulong Xie

Researcher & Builder & Founder·PhD in Artificial Intelligence

Human-AI Interaction & Dynamic Software

Human-Computer InteractionSoftware EngineeringAgent Systems
Mulong Xie

Research

What I work on

01

Human-AI Interaction & Software as Content

Rethinking the interaction layer between humans and agentic systems. The SaC paradigm proposes that AI should generate live, evolving UIs as agentic applications rather than static textual responses.

Software as ContentHuman-Agent InteractionAgentic Applications
02

Dynamic Software

A new class of software where the frontend is generated on-demand and evolves through interaction — not shipped as a fixed pre-built. The backend is an agent system, not a static codebase.

Agentic ApplicationsSoftware GenerationDynamic Systems
03

GUI Agent

Autonomous agents that navigate GUIs to complete tasks across apps and platforms — combining visual grounding, cross-app action execution, and non-intrusive automation without code instrumentation.

Task AutomationCross-App ExecutionVisual GroundingRPA
04

Visual Intelligence for Productivity

Intelligently parse UI semantics from raw visual inputs and transform static artifacts into live outputs — element detection, layout understanding, design-to-code, and form digitization.

UI UnderstandingDesign-to-CodeForm DigitizationLayout Analysis

Selected Work

Featured projects

Software as Content (SaC)

Software as Content (SaC)

A new human-agent interaction paradigm through generative, on-demand, evolving applications.

Human-AI InteractionAgentic SystemsHCI Theory

PortalX — Software Modelling

Autonomous agent to explore any given software & provide non-intrusive user assistance & automations. YC China 2025.

StartupSoftware AutomationEnterprise AI

UTA — Universal Task Assistant

Mobile AI agent layer to understand apps and assist the user with app-intent mapping, step-by-step function guidance, and task automation.

Mobile AIStartupHuman-AI InteractionUI Automation
PhD Thesis — Visual Intelligence for GUI Automation & Beyond

PhD Thesis — Visual Intelligence for GUI Automation & Beyond

Three-year exploration and research on software understanding and automation even before the agent era.

Software EngVisual IntelligenceGUI Automation
Visual Software Semantic Understanding

Visual Software Semantic Understanding

Unsupervised vision-based approach to analyze the spatial and semantic relations for the GUI elements and blocks.

Software EngVisual IntelligenceUI Understanding
APP Voice Control x Robot Arm

APP Voice Control x Robot Arm

Identify the target UI component in any app by user saying what they want in natural language, and use a Robot Arm to interact with the device automatically — a physical GUI agent before the agent era.

Software EngComputer VisionUI Automation

Background

Experience & Education

Independent

Researcher & Builder

2026 – Present

IndependentRemote

  • Developing the Software as Content (SaC) theoritical foundation & open-source protocol
  • Toward the Dynamic Software
Fellou

Co-founder

2025.09 – 2026.03

FellouHybrid

  • Co-founding the world's first agentic browser, leading product design and research
  • Over $30M fund raised
PortalX

Founder

2025.01 – 2025.09

PortalXSydney, Australia

  • Leading the R&D and commercialization of the enterprise software intelligence automation agents
  • Selected for Miracle Plus (former YC China) Fall 2025 — 1% acceptance from 5,800+ applicants
CSIRO's Data61

Research Scientist

2024.01 – 2025.10

CSIRO's Data61Sydney, Australia

  • Leading multiple research projects and commercialization on responsible AI & software engineering automation
  • CSIRO Technology Innovation Award 2024
  • On-prime Accelerator Prize-winning project
CSIRO's Data61

Postdoc

2022.12 – 2024.01

CSIRO's Data61Sydney, Australia

  • Leading research on intelligent software engineering and AI safety
  • ACM Best Paper Award
TF-AMD

Full-Stack Engineer Intern

2018.11 – 2019.01

TF-AMDPenang, Malaysia

  • Developed an internal document search engine as a full-stack engineer
  • Recommendation letter from the Product Director
Education
Australian National University

Ph.D in Artificial Intelligence

2020 – 2022

Australian National UniversityCanberra, Australia

  • Thesis: Visual Intelligence for GUI Understanding and Automation
  • National Research Agency's PhD Top-up Scholarship
  • HDR Fee Remission Scholarship
  • Postgraduate Research Scholarship
Australian National University

(Honors) Bachelor of Software Engineering

2018 – 2020

Australian National UniversityCanberra, Australia

  • Thesis: UI2CODE: Computer Vision Based Reverse Engineering of User Interface Design
  • GovHack 2019 Australian Capital Territory, Runner Up (2019)
  • Government Grant in Innovation Australia Capital Territory Competition (2018)
Nanjing University of Science and Technology

Bachelor of Intelligent Science and Technology

2015 – 2018

Nanjing University of Science and TechnologyJiangsu, China

  • Member of a key laboratory of unmanned aerial vehicles (2017)
  • Class President (2017-2018)

Public Projects

All Public Projects

2026

Independent Researcher
Software as Content (SaC) 1

Software as Content (SaC)

A new human-agent interaction paradigm through generative, on-demand, evolving applications.

Human-AI InteractionAgentic SystemsHCI Theory

2025

Founder
Fellou 1

Fellou

Co-founded at Fellou — the world's first agentic browser. Raised over $30M in funding.

StartupAgentic BrowserHuman-AI Interaction
PortalX — Software Modelling 1

PortalX — Software Modelling

Autonomous agent to explore any given software & provide non-intrusive user assistance & automations. YC China 2025.

StartupSoftware AutomationEnterprise AI

2024

Research Scientist
UTA — Universal Task Assistant 1
UTA — Universal Task Assistant 2

UTA — Universal Task Assistant

Mobile AI agent layer to understand apps and assist the user with app-intent mapping, step-by-step function guidance, and task automation.

Mobile AIStartupHuman-AI InteractionUI Automation

2023

Research Scientist
NiCro 1

NiCro

Record once, replay anywhere. NiCro captures user actions on one device and re-executes them across iOS and Android at any screen size — purely vision-based, zero source code access required.

Software EngComputer VisionUI Auto Testing

2022

PhD in Software Engineering & Human AI Interaction
PhD Thesis — Visual Intelligence for GUI Automation & Beyond 1
PhD Thesis — Visual Intelligence for GUI Automation & Beyond 2

PhD Thesis — Visual Intelligence for GUI Automation & Beyond

Three-year exploration and research on software understanding and automation even before the agent era.

Software EngVisual IntelligenceGUI Automation
Visual Software Semantic Understanding 1

Visual Software Semantic Understanding

Unsupervised vision-based approach to analyze the spatial and semantic relations for the GUI elements and blocks.

Software EngVisual IntelligenceUI Understanding
AR x Object Detection 1
AR x Object Detection 2

AR x Object Detection

What if AR glasses could understand the real world, not just overlay virtual objects onto it? This project integrates natural object detection with AR wearables — making the environment itself machine-readable.

Computer VisionAugment Reality
Palm Recognition 1
Palm Recognition 2

Palm Recognition

Training-free real-time palm region detection and feature extraction from hand images — using classical image processing to achieve ~18fps without any annotated data or deep learning overhead.

Computer Vision

2021

PhD in Software Engineering & Human AI Interaction
APP Voice Control x Robot Arm 1
APP Voice Control x Robot Arm 2

APP Voice Control x Robot Arm

Identify the target UI component in any app by user saying what they want in natural language, and use a Robot Arm to interact with the device automatically — a physical GUI agent before the agent era.

Software EngComputer VisionUI Automation
UI Component Perceptual Grouping 1

UI Component Perceptual Grouping

Humans don't see isolated buttons and labels — they see cards, lists, menus, and tabs. This project applies Gestalt psychology to automatically segment any GUI into perceptual layout blocks, the way a human would.

Software EngComputer VisionUI Understanding
ezForm 1

ezForm

Snap a photo of any paper form — ezForm converts it into a fully interactive web form automatically, using computer vision to recognise every field, checkbox, and layout structure.

Software EngComputer VisionVI4P

2020

PhD in Software Engineering & Human AI Interaction
EasyD2C 1

EasyD2C

Upload a UI screenshot, get back modular HTML, CSS, and React code — EasyD2C uses computer vision to reverse-engineer design images into structured, maintainable front-end code.

Software EngComputer VisionVI4P
UI Element Detection (UIED) 1
UI Element Detection (UIED) 2

UI Element Detection (UIED)

Unsupervised detection of UI elements from any GUI screenshot — no training data, no labels. UIED combines Google OCR for text and a CV+CNN pipeline for non-text elements, handling both mobile and desktop UIs.

Software EngComputer Vision

2019

Bachelor
GUI Image Code Generation 1

GUI Image Code Generation

Computer vision-based reverse engineering of UI design — automatically converts a GUI screenshot or mockup into working UI code and a structured element tree, bridging the designer-to-developer gap.

Software EngComputer Vision

2018

Bachelor
Geographical Change Detection & Report 1
Geographical Change Detection & Report 2

Geographical Change Detection & Report

Detect and report land-use changes — vegetation growth, new construction, land clearing — by contrasting satellite images of the same region across time periods using computer vision.

Computer Vision
Universal Keywords Search Engine 1

Universal Keywords Search Engine

Search across massive unstructured document databases — txt, PDF, Word — using keyword queries. Built on ElasticSearch with a user-friendly web interface.

Search Engine

2017

Bachelor
Digital Target Detection on UAV 1
Digital Target Detection on UAV 2

Digital Target Detection on UAV

Detect physical targets in natural environments and read the digital numbers on them — instructing an unmanned aerial vehicle to execute corresponding actions autonomously.

Computer VisionUnmanned Aerial Vehicle
Color Target Detection on UAV 1
Color Target Detection on UAV 2

Color Target Detection on UAV

Detect coloured target regions in dynamic natural environments to instruct the UAV to complete actions — robust to lighting variation, motion blur, and complex backgrounds.

Computer VisionUnmanned Aerial Vehicle