Data Exhaust

Playback speed

Share post at current time

Share from 0:00

0:00

Generate transcript

A transcript unlocks clips, previews, and editing.

Data Exhaust

What happens when you give Claude skills

Pulp Conversations

Jul 02, 2026

You ask Claude to summarize your emails. You connect it to your calendar so it can draft meeting notes. You give it access to your health app because you want better sleep advice.

Each request feels contained. A task completed, a problem solved, then you move on.

But the data doesn’t.

It accumulates. It combines. It becomes a fingerprint more revealing than anything you’d knowingly hand over. And because the transfer happens in fragments you may never see the whole picture.

This isn’t a story about villains. It’s about incentives, compounding effects, and the gap between what you think you’re sharing and what the system actually learns.

Scenario One: The Productivity Trap

Deadlines are stacking so you start using Claude to help draft client emails, outline proposals, and refine pitches.

Then you connect it to Gmail so it can pull context from past threads. Next, you integrate your calendar so it knows when you’re free. You link Notion so it can reference your project notes.

What you think is happening:
Claude helps you write faster. Your inbox stays manageable. You hit deadlines.

What’s actually happening:

Every email Claude reads to draft a reply becomes part of your input history. If you’re opted in to training (the default for most users), that email — with client names, project details, negotiation terms, your writing style, your concerns — is retained for five years and used to train future models.

Even if you’re opted out, any email flagged for safety review (a client forwards you something controversial, you paste a heated exchange) gets used for model improvement anyway.

The exhaust trail - what becomes part of Anthropic’s dataset?

Some or most of your emails
Your calendar, including patterns of when you work, who you meet and how often you reschedule
Your Notion notes, project structures, client relationships and revenue expectations
Your usage metadata including what time you ask for help, what topics stress you out and when you’re most productive

The compounding effect:

Six months in, the system knows:

Your negotiation style and weak points
Which clients you’re worried about losing
When you’re behind on deadlines (more frequent late-night Claude usage)
Your income sources and their relative stability
How you respond to pressure

You don’t chose to reveal these secrets but they emerge from the pattern of small, instrumental requests.

How does your data get used?

High probability: Your writing patterns train the model’s understanding of “professional communication under deadline pressure.” The next writer who asks Claude for help gets suggestions informed by your style, your compromises, your voice.

Medium probability: A future employer runs a reference check. They don’t call your old boss but they analyze your publicly available writing against the model’s learned patterns and flag stylistic inconsistencies that suggest you had “editorial assistance.” You don’t get the job. You never know why.

Low probability, high impact: A data broker combines your Claude usage metadata (stressed writer, inconsistent hours, frequent revisions) with your credit card data (late payments, high-interest debt) and your LinkedIn activity (job search keywords). The combined profile flags you as high-risk for loan underwriting. Your mortgage application gets denied. The model’s decision is proprietary. You can’t contest what you can’t see.

Scenario Two: The Wellness Penalty

You’re trying to get healthier so you connect Claude to Apple Health so it can give you personalized advice on sleep, exercise, and stress management.

It works. You start getting better insights than any fitness app ever gave you! You feel like you’re finally making progress.

Then you participate in an Anthropic research study about AI health coaching. They offer a $50 in exchange for filling out a survey about your goals and challenges.

What you think is happening:

You’re helping improve AI health tools. Your data is anonymized. It’s for research.

What’s actually happening:

Per Anthropic’s new policy: “We may combine this data with other data from your account for aggregated analysis.”

Your health metrics (sleep disruption, stress markers in HRV data, exercise inconsistency) get cross-referenced with:

Your conversation history (topics you ask about: focus, productivity, parenting stress)
Your usage patterns (when you use Claude: 6-8 AM, 9-11 PM — the hours around your daughter’s schedule)
Your technical metadata (location, device type, timezone)
Your study responses (self-reported stress, work-life balance concerns)

The exhaust trail:

Your health data becomes part of Anthropic’s research dataset
Your conversation topics contribute to model training
Your data evolves into revealing behavioral patterns

The compounding effect:

Anthropic publishes research: “Users with disrupted sleep patterns + afternoon productivity queries + parenting-age demographics show 40% higher stress markers and 25% more frequent AI usage during traditional family hours.”

The research is anonymized. But your pattern is specific:

Male, late 40s, NYC, writer
Sleep disruption consistent with young child
Usage peaks before 8 AM and after 9 PM
Questions about focus, time management, creative work, parenting

How does your data get used?

High probability: You’re also enrolled in a UnitedHealthcare wellness program (free Apple Watch if you hit 10k steps/day). UHC’s analytics vendor runs the published pattern against their dataset and gets an 85% match for policyholder #847392.

They don’t know it’s you. But they know policyholder #847392 fits a profile that correlates with:

Higher likelihood of mental health service utilization
Increased disability claim probability
Elevated stress-related health costs

Your next premium renewal goes up 12%.

The pricing model is proprietary. You have no way to trace it back to the moment you asked Claude for sleep advice.

Medium probability: Your employer offers a voluntary wellness program through the same vendor. You opted in for the HSA contribution. The vendor now has two datasets that match the same pattern. They don’t need your name — the overlap is the identifier. Your manager gets a report flagging “high-stress employees at risk for burnout.” You’re not named, but you’re in the cluster. The next promotion goes to someone else.

Low probability, high impact: You apply for life insurance. The underwriter uses a third-party risk model that incorporates “publicly available health and wellness research.” Your application gets flagged for additional medical screening. The process takes six months instead of six weeks. You miss the coverage window you needed before a planned surgery. You pay out of pocket.

Scenario Three: The Parent Trap

You’re a parent. Your 6-year-old daughter is struggling in school. You ask Claude for advice on learning strategies, behavioral issues, and how to talk to her teacher.

You also use Claude to help draft emails to the school, research educational resources, and brainstorm activities that might help her engage.

What you think is happening:
You’re getting parenting support. The conversations are private. You’re being a good parent.

What’s actually happening:

Every question about your daughter becomes part of your conversation history. If you’re opted in, it trains the model.

The questions reveal:

Your daughter’s age, school, learning challenges
Your parenting style, anxieties, areas where you feel inadequate
Your relationship with the school (cooperative, adversarial, deferential)
Your socioeconomic context (private school vs. public, tutoring budget, enrichment activities)

The exhaust trail:

Your questions train the model (if opted in)
Your email drafts to the school reside in Gmail API logs and Anthropic’s input records
Your research queries show a pattern of concern that reveals more than any single question

The compounding effect:

You also use Google Docs (connected to Gemini) to keep notes on parent-teacher conferences. You use your work Slack (connected to Claude) to vent to a colleague about school stress. You search Google for “NYC private schools” and “learning disability testing.”

Each platform has a piece. None of them has the whole picture. But the pieces fit together.

How does your data get used?

High probability: The model learns that parents of struggling 6-year-olds ask certain types of questions in certain patterns. The next parent who asks for help gets suggestions informed by your anxieties, your compromises, your blind spots. Your data becomes the training set for advice you might never give in person.

Medium probability: Your daughter applies to middle school in three years. The school uses an AI-powered admissions tool that analyzes “family engagement patterns.”

It doesn’t have access to your Claude conversations directly, but it does have access to:

Your email communication patterns with the current school (frequency, tone, keywords)
Your public social media activity (what you share, when, what you don’t)
Your digital footprint (searches, clicks, time spent on school-related content)

The tool flags your family as “high-maintenance” based on pattern matching. Your daughter doesn’t get in.

The decision is algorithmic. You never see the criteria.

Low probability, high impact: You’re in a custody dispute. Your ex’s lawyer subpoenas your AI conversation history as evidence of “parental fitness.” The court allows it because you opted in to training. The data is technically “not private” under the terms you agreed to. Your questions about your daughter’s behavioral issues, taken out of context, get used against you. You lose custody time.

The Pattern of Data Exhaust

Each interaction feels small, but the aggregate is revealing. Algorithms can find correlations you’d never see yourself. Data collected for one purpose (help me write an email) is easily used for another (train a model, inform a risk score). Similar to lending or admissions: decisions are made via systems you can’t audit, based on criteria you can’t see.

What You Can Do

Every convenience has a cost, and the cost is usually invisible until it’s too late to negotiate.

You should assume that any data you give one platform can be matched with data from another. The more specific your pattern, the easier you are to identify.

Opting out of training doesn’t mean opting out of storage. Data retained for “safety” or “research” is still data that can be subpoenaed, breached, or repurposed.

Be mindful of your data exhaust. Use a SOC 2-compliant AI-container when using AI or a local air-gapped LLM for the most sensitive ends.

Flaws ain't flaws when it's you that makes the call - Pusha T

Data Exhaust

Scenario One: The Productivity Trap

Scenario Two: The Wellness Penalty

Your next premium renewal goes up 12%.

Scenario Three: The Parent Trap

The tool flags your family as “high-maintenance” based on pattern matching. Your daughter doesn’t get in.

The Pattern of Data Exhaust

What You Can Do

Music: HIPS by Russi and Epping

Discussion about this video

Ready for more?