FileGram: Grounding Agent Personalization
in File-System Behavioral Traces

1S-Lab, Nanyang Technological University   2Synvo AI   Corresponding authors
Click for sound
Abstract
FileGram Overview

Coworking AI agents operating within local file systems are rapidly emerging as a paradigm in human–AI interaction. Since users exhibit highly diverse workflows, personalization is essential for tight collaboration and a seamless user experience. However, effective personalization is limited by severe data constraints, since strict privacy barriers and the inherent difficulty of jointly collecting multimodal real-world traces preclude the creation of scalable training data and comprehensive evaluation suites. Consequently, existing methods remain interaction-centric and overlook dense behavioral cues embedded in file-level activities. To bridge this gap, we propose FileGram, a comprehensive framework that grounds agent memory and personalization in file-system behavioral traces. FileGram comprises three core components: (1) FileGramEngine, a scalable, persona-driven data engine that simulates realistic workflows; (2) FileGramBench, a diagnostic benchmark that treats file operations as behavioral engrams; (3) FileGramOS, a bottom-up memory architecture that builds user profiles directly from atomic file-level signals. Extensive experiments show that FileGramBench remains challenging for state-of-the-art memory systems, and demonstrate the effectiveness of FileGramEngine and FileGramOS.

Framework
Three components address data scarcity, evaluation gaps, and method limitations.
Data Generation
FileGramEngine

Scalable persona-driven simulation producing 640 controlled trajectories with ground-truth labels across 6 behavioral dimensions and 20 user profiles.

Evaluation
FileGramBench

First file-system memory benchmark. Four tracks: profile reconstruction, reasoning, anomaly detection, and multimodal visual grounding.

Method
FileGramOS

Bottom-up memory building user profiles from atomic file signals—procedural, semantic, and episodic channels—not dialogue summaries.

FileGramEngine
20
User Profiles
32
Tasks
640
Trajectories
6
Dimensions
Data Pipeline

FileGramEngine simulates realistic file-system workflows via persona-driven agents. Each profile is defined by six behavioral dimensions, producing fine-grained multimodal action sequences at scale.

~10K Output Files
PDF
3,609
Document
3,093
Markdown
2,310
Presentation
1,547
Image
1,031
Audio
516
Spreadsheet
516
20,028 Atomic Actions
File Read
4,541
Cross-File Ref
4,094
Context Switch
3,909
File Write
3,024
File Browse
1,649
File Edit
1,057
Dir Create
944
Others
810
FileGramBench
4,653 QA pairs across 4 evaluation tracks and 3 memory channels.
TRACK I
Understanding
Profile reconstruction & fingerprinting
886 questions
TRACK II
Reasoning
Pattern inference & disentanglement
1,694 questions
TRACK III
Detection
Anomaly & behavioral drift
1,103 questions
TRACK IV
Multimodal
Visual grounding from recordings
650 questions
QA Examples

Example questions from the four evaluation tracks, testing behavioral memory from procedural file operations to cross-modal visual reasoning.

FileGramOS
Stage 1
Engram Encoding
Per-trajectory atomic signal extraction
Stage 2
Consolidation
Cross-engram structured MemoryStore
Stage 3
Adaptive Retrieval
Query-time evidence composition
FileGramOS Architecture

FileGramOS builds profiles from atomic file signals, preserving procedural, semantic, and episodic memory through a three-stage bottom-up pipeline.

Results
FileGramOS outperforms all baselines on FileGramBench.
Main Results Table
SimpleMem
32.9
Mem0
33.2
MemOS
36.2
Zep
40.2
Naive RAG
40.5
MemU
44.4
MMA
44.7
Full Context
48.0
Eager Summ.
49.5
EverMemOS
49.9
VisRAG
51.9
FileGramOS
59.6
+7.7% over best baseline
Qualitative Results

Qualitative comparison. Left: A BehavFP question where FileGramOS's three-channel architecture jointly recovers the correct profile, while baselines each miss different signals. Right: A TraceDis question involving multimodal artifacts, where cross-format output gaps cause widespread failures.

BibTeX

Coming soon.