Babysitting a Toddler Who Forgot the Rules: What Custom Agents Taught Me About Automation Reliability

I love Notion. I’ve built a lot of my work and life systems inside it.

That’s why I was genuinely excited to try Notion’s custom agents for something I do every day: inbox triage.

The plan was simple: I connected three Gmail inboxes through Notion Mail (via the Gmail connector) and set up an agent to run the same workflow every morning at 5:30am—organize, triage, draft replies when appropriate, and send me a notification so I could start the day already in motion.

I tested it. It worked.

And then… it started behaving like a toddler who knows the rules and occasionally decides they don’t apply.

This isn’t a “Notion failed” post. It’s a “this is what automation is like in real life” post—because I’ve seen the same pattern across a lot of platforms, tools, and workflows.

One important detail: I didn’t build this agent from scratch. I started with Notion’s own custom agent template—the “official” starting point designed to help you get value quickly.

Which is why the inconsistency was so surprising: this wasn’t an experimental DIY setup. It was the out-of-the-box path.

The promise: the invisible work gets handled
The dream of automation isn’t that it’s fancy.
It’s that it’s reliable.

When something runs every day, quietly, in the background, it creates this beautiful sense of stability:
• the work is happening
• you don’t have to think about it
• you can trust the system to keep its promises

That trust is the product.

The reality: inconsistent success is worse than failure
What happened wasn’t constant failure. If anything, it was more frustrating than that.

It was inconsistent success.

Sometimes it wouldn’t process all three inboxes
Some mornings, I’d see evidence it ran for one Gmail account but not the other two—like it forgot there were multiple inboxes connected at all.

Sometimes it wouldn’t notify me
Even when something did happen, the feedback loop wasn’t consistent. And once notifications become unreliable, you’re forced into supervision mode:
“Did it run? Did it run everywhere? Do I need to check manually?”

Sometimes it would draft replies… and sometimes it wouldn’t
The drafting behavior felt variable. Same intention, same setup, different outcomes.
And that’s where the “babysitting” metaphor really clicks: I wasn’t delegating. I was monitoring.

Why this happens (and why it’s not a Notion-specific problem)
This is where I want to zoom out, because I don’t think this is unique to Notion—or even unique to AI.
I see this across automations in general: Zapier/Make workflows, API connectors, scheduled jobs, inbox rules, “smart” assistants, background sync tools.

Once you connect multiple systems, you inherit the reality that:
• connectors fail quietly sometimes
• authentication can drift
• rate limits exist
• retries happen… or don’t
• partial success looks like success unless it’s reported clearly
• the system may have done something, but not everything

And unless the automation is designed with great observability (clear logs, run history, and success/failure reporting), you’ll never fully know what happened without checking.

And to be clear, I’m not blaming Notion or pretending this is a simple problem. Templates can’t account for every edge case—especially once you introduce multiple accounts, connectors, permissions, and schedules.

But templates do set expectations. When the default path behaves unpredictably, it highlights the bigger truth: automation reliability is less about how smart the agent is, and more about how well the system handles real-world variance.

The actual lesson: trust requires receipts
I still want this future. I want agents. I want delegated work. I want the 5:30am magic.
But here’s what this experience reinforced for me:
If an automation can’t reliably tell you:
• what ran
• what didn’t run
• why
• what it changed / drafted / skipped
…then it’s not really automation yet. It’s a slot machine that occasionally pays out.
The weird part is that even if it “mostly works,” the inconsistency creates a new kind of work:
• checking
• confirming
• cleaning up
• re-running
• rebuilding trust
At that point, the system isn’t saving your attention—it’s consuming it.

I’m still rooting for it
To be clear: I’m not mad at Notion. I’m not even disappointed in the idea.
I’m noticing a very normal stage in the evolution of automation: the gap between “this is possible” and “this is dependable.”
Notion’s custom agents feel like they’re pointing at the right future.
I just want them to grow out of the toddler era.