Polymath Society | Candidate Evaluation

polymathsociety.com / reports / latest-0Fleshed out report

Task

Build an agent that does better on LegalBench-RAG — an open benchmark for case-law retrieval and analysis, scored against ground-truth citations from real legal writing.

Also do anything else you think that could help the law industry adopt AI at large.

Top 1%of Big Tech employees in the Bay Area

3rdin cohort on raw output, among 20 candidates

28%LegalBench-RAG score on the held-out test set — 4× a naive Opus 4.7 baseline (7%)

42experiments run across the build vs. a cohort average of 18 — shot count is a strong proxy for intensity

FDE skills · demonstrated this cohort

·Has read extensively on agent harnesses — scaffolding, constraints, knowledge management.
·Built an eval set of 80 test cases from scratch — strong eval-engineering instincts.
·Experimented extensively with retrieval and memory architectures.

Demo · what he built

Walks through the agent end-to-end on a real LegalBench-RAG task — retrieval, reasoning, citation. Last 90s shows the vibecoded interface a lawyer can drive directly. Recorded day 14, single take, unedited.

Percentile vs. big tech engineers (Amazon / Google)

Estimated by big tech employees themselves, based on their co-workers. Dashed ring is the median big-tech engineer.

Hover any vertex for the percentile.

this candidatemedian big-tech engineer

Rating scale · for observations + downsides

bad

bottom of working engineers

okay

mid-tier of working engineers

good

top quartile of working engineers

great

top decile of working engineers

exceptional

Anthropic researcher / FAANG L6 tier · top 1–2%

What we observed

1exceptional

Very high agency and judgement — built a self-improving harness and a UI for lawyer feedback, and read the surrounding literature extensively on his own.

Why

·Routinely read up on literature quickly to make the best decisions — both for law terms and for agent harness best practices, retrieval mechanisms, etc. Ramp speed was very high (top 10% of cohort).
·On day 6 he realised he was iterating on the harness too slowly in code. Built a UI for lawyers to give feedback, with a self-improving harness.
·Came up with a CI/CD pipeline for testing each experiment and version-controlling it himself.

2great

Intense worker — 6th out of 20 on intensity of work for the hours they put in.

Why

·Put in an average of 6.2 hours every day. Cohort average was 4.2 hours. 5+ hours on 13 of 14 days.
·Average focus of 6 (60% flow state) on a scale of 8 — cohort average was 2.5.
·On day 3, read 11 blogs and papers on harnesses and evals over a span of 4 hours. More stamina than most.
·Operated with a sense of urgency — set a schedule, set ambitious goals, and got 80% of tasks done on time.

3great

Highly parallel AI tool usage.

Why

·When given feedback, ramped up AI tool usage from 80% of the time single-threaded to 10% of the time single-threaded.
·On day 3, over a span of 2 hours, managed the design of the lawyer-feedback website and 2 coding agents in parallel.
·Behaviour change was fairly immediate.

Downsides

1okay

Over-handing to AI.

Why

·For technical tasks, often handed them over to AI completely while keeping things underspecified. Decided AI couldn't do the task and stepped in only after nearly half an hour of flailing.
·Even for designing interfaces, depends on an AI first draft rather than specifying in granular detail to the AI.
·Priors are that this will likely improve quickly with feedback.

2okay

Over-deliberates decisions instead of getting real-world feedback.

Why

·Made 5 design iterations over 2 days for the lawyer-facing platform.
·Did not contact real lawyers to put it in front of them. When the AI recommended that, he decided against it out of fear of burning social capital.

In their own words · self-reported

These are the candidate's own answers about how they like to work. Self-reported, not a measured assessment — useful for fit, not a personality verdict.

How do you like to best work with people? Be as descriptive about how you are with teams, work and otherwise.

I do my best work heads-down and solo, but I'm not closed off — I like a small team where everyone owns a clear piece, we sync briefly, then go deep on our own. I default to writing things down over scheduling a meeting. In a group I'm more of a quiet builder than the loudest voice; I'll push back on something I disagree with, but usually in writing or 1:1 rather than in a big room. Socially I'm friendly but reserved, and I recharge alone.

How do you like feedback?

Direct and written, so I can sit with it. I tend to act on it a day later rather than in the moment — I like to think it through before changing course.

When are you most productive? What is your ideal way to work?

Early morning and late evening; the middle of the day is my weakest stretch. My ideal day is two or three uninterrupted 3–4 hour blocks, a goal set up front, and notifications off until the block is done.

What drains you?

Lots of small context-switches and back-to-back meetings. Cold outreach to strangers is the thing I avoid most.

Day by day · at a glance

FOCUS chip · % time in flow

slow

0.5–1.5

<30% in flow state

okay

1.5–3

30–50% in flow state

intense

3–7

50–70% in flow state

very intense

7–8

>70% in flow state

OUTPUT chip · vs. cohort

slow

0.5–1.5

bottom of cohort that day

okay

1.5–3

around cohort median

intense

3–7

top tier of cohort that day

very intense

7–8

#1 or #2 of cohort that day

At a glance · across 14 days

hours

actual hours worked

focus

slow → very intense

output

slow → very intense

Resume · background

····

Education

University of California San DiegoSep 2023 – Mar 2025

M.S. in Computer Science and Engineering · Specialization in Artificial IntelligenceGPA 4.00 / 4.00

·Key Courses: Computer Vision, Robotics, ML Systems, Software Engineering, Recommender Systems

Indian Institute of Technology BombayJul 2019 – Jul 2023

B.Tech with Honors in Computer Science and Engineering · Minor in EntrepreneurshipCPI 9.66 / 10

·Key Courses: Advanced Image Processing, Machine Learning, Linear Algebra, Probabilistic Theory, Web Security

Experience

Computer Vision Intern · Duality AIJun 2024 – Sep 2024

·Built pipelines to generate high-fidelity Gaussian Splatting synthetic environments to validate vision models in real-world settings
·Designed automated 3D reconstruction techniques for featureless objects, reducing digital-twin generation time by 40%
·Collaborated with Autodesk to validate Unreal Engine simulations for robotics tasks; structured domain randomization reduced Sim2Real gap and increased mAP-50 by 15% for object detection and segmentation

Data and Applied Scientist Intern · Microsoft IndiaMay 2022 – Jul 2022

·Developed a decision-tree ranker to recommend emails without user queries, improving Outlook search capabilities
·Integrated data pipelines across team infrastructures, combining user-specific features from large-scale context logs
·Proposed hierarchical feature-sets for the ranker, reducing latency for recommendations and improving recall

Key Projects

Mirror AI: Deployable PersonasOct 2024 – Dec 2024

Honorable mention, Supabase YC Hackathon

·Designed an agentic LLM architecture with LangGraph to mirror user personalities as interactive digital personas
·Deployed a full-stack platform using Supabase + Vercel for secure hosting and user authentication
·Integrated with the Notion API for personal context; one-click deployment to publish a persona

Improving LLM Reasoning for Numerical ProblemsSep 2024 – Dec 2024

·Enhanced MathPrompter (ACL 2023) with chain-of-thought, achieving 10% higher accuracy on Llama 3.1 1B where prior methods failed
·Reduced hallucination rates by integrating multi-step validation for robust, consistent outputs

Inverse Rendering with 2D Gaussian SplattingMar 2024 – May 2024

·Built a novel inverse-rendering framework in CUDA to recover PBR properties of a scene using 2D Gaussian Splatting
·Improved normal-map MAE by 15% over current SOTA, achieving superior novel-view synthesis and relighting

Real-time 3D Perception for Home RobotsSep 2023 – Sep 2024

Graduate Student Researcher, Supervisor: · UC San Diego

·Investigated real-time dense visual SLAM methods using NeRFs and Gaussian Splatting for robot navigation
·Integrated object segmentation, grasp-pose estimation, and 3D mapping on the Fetch robot via ROS; novel tabletop rearrangement algorithm reduced cost by 20% vs. SOTA

3D Tomography with Primal-Dual Neural NetworksMay 2021 – Jul 2023

UCL Research Internship, Supervisor: · University College London

·Built a stochastic neural-network architecture of a primal-dual algorithm for online 3D-volume reconstruction from tomographic projections; 99.6% structural similarity in low-dosage conditions
·Shipped a Python library with custom gradient operators for single-pass volume reconstruction, cutting compute by up to 5× over SOTA learning-based approaches

Other

·Image Colorization GAN — web app coloring grayscale images using pix2pix U-Net architecture
·Sudoku Solver — Augmented Reality app solving Sudoku from a live feed with robust real-time performance
·Autonomous Robot — Roomba-like robot with visual-SLAM using EKF and A* path planning on ROS

Skills

ProgrammingC++, C, Python, MATLAB, Linux & Bash, SQL, HTML, JavaScriptToolsPyTorch, ROS, TensorFlow, scikit-learn, OpenCV, ReactJS, Matplotlib, ArduinoExpertise inFull-stack development, Generative AI, 3D Perception, ML Systems, Statistical Image Processing