ION / DATA PLATFORM / AI ACCESS LAYER
Draft for team review
01 / Position

ION AI Data Access Platform

The platform should not be a giant natural-language SQL machine. It should be a governed access layer where agents, chat interfaces, and internal tools use approved business-level tools with identity, permissions, citations, and audit logs built in.

MCP Gateway Business Tools Google Cloud Vanna MVP Agent Ready Permission Enforced
Replace
Broad LLM-generated SQL over raw BigQuery and production Postgres.
With
Approved MCP/API tools that express business capabilities like client timelines, ad performance, rank visibility, and Slack context.
Gateway Role
Zach's MCP Gateway can act as the front door for user context, tool allowlists, policy, logging, and source routing.
Acceleration
Use Google Cloud and Vanna where they speed up analytics, retrieval, evaluation, and chat UX without bypassing the governed tool layer.
02 / Architecture

One governed front door. Many controlled data tools.

Claude, ChatGPT, internal agents, and chat apps should not each invent their own database path. They should call the same governed tool layer, and that layer should decide which source API, MCP, or curated analytics path is allowed.

AI Interfaces
Claude / ChatGPT
Conversational questions and analysis.
Vanna Chat MVP
Analytics chat over curated views and approved SQL patterns.
Internal Agents
Automated workflows, briefs, QA, follow-up.
Team Tools
Dashboards, portals, Slack-triggered actions.
MCP Gateway
Authenticated User Context
Who is asking, from which interface, with which role.
Allowed Tool Catalog
Only expose tools the user and interface can use.
Audit Log
Question, tool calls, records touched, answer metadata.
Approved Tools
Skills Route
Instructions choose the right tool and answer shape.
Business Tools Execute
Tools fetch data through APIs, MCPs, or curated views.
Restricted SQL
Admin/data-team fallback only, not the main path.
Acceleration Layer
BigQuery Remote MCP
Managed MCP path for curated analytics and admin analysis.
Gemini / Agent Search
Enterprise retrieval for indexed or federated knowledge.
Vanna Tool Memory
Golden query learning for repeated analytics questions.
Model Armor + Evals
Screen tool traffic and test real answer quality.
Source APIs / MCPs
Comm Manager API/MCP
Calls, recaps, email, timelines, tasks.
Agency Framework GraphQL
Aggregated performance metrics.
SearchAtlas API/MCP
SEO, GBP, KRT, rank and visibility data.
Slack MCP / RTS API
Internal context scoped to the user.
Data Sources
No Direct Raw DB Access
Agents do not directly query production Postgres or unrestricted BigQuery tables.
BigQuery
Analytics MCP/API over curated reporting tables.
GHL / Practice Beacon
CRM, opportunities, contacts, appointments, SMS layer.
ClickUp
Tasks, docs, agendas, workspace structure.

Important correction to the original plan: SQL is still useful, but it should be a controlled analytics capability inside the system. The platform should not depend on the LLM understanding every raw table and column to answer business questions.

03 / GCP + Vanna Acceleration

Use managed platforms to move faster without giving up control.

Google Cloud and Vanna can help the team get live faster, but they should sit behind the same governed access model. They accelerate analytics, retrieval, evaluation, and chat UX. They do not replace the MCP Gateway or the business-level tool catalog.

Control Plane

Zach's MCP Gateway

Owns identity, tool allowlists, client/location scope, audit decisions, and source routing for every interface.

Acceleration

Google Cloud + Vanna

Provides managed BigQuery MCP, Conversational Analytics, retrieval/search infrastructure, Vanna chat UX, golden query memory, tracing, and evaluation support.

Business Output

Faster, safer answers

Agents answer with citations, freshness, permission checks, and consistent metric ownership instead of improvising over raw data.

Where Google Cloud Helps
Use
BigQuery Remote MCPUse for governed BigQuery access, especially curated analytics views and admin/data-team exploration. Keep customer/location scoping above it.
Conversational AnalyticsUse data agents for analyst-style questions over curated BigQuery sources with metadata, synonyms, defaults, and golden queries.
Gemini Enterprise / Agent SearchEvaluate for indexed or federated search across knowledge sources. Confirm connector availability, Slack requirements, and access-control behavior before relying on it.
Model Armor, IAM, logging, evalsUse for MCP security screening, identity enforcement, tracing, monitoring, and regression tests against real business questions.
Where Vanna Fits
Scoped
Good fitNatural-language analytics chat, streaming tables/charts, approved SQL examples, DDL/documentation training, and tool memory for repeated questions.
BoundaryDo not make Vanna the universal access layer for calls, emails, Slack, ClickUp, SearchAtlas, and CRM workflows.
Production checkConfirm the current enterprise support path and roadmap before making it critical infrastructure, since the public GitHub repository is archived while the product docs remain active.
Best use nowKeep the current chat MVP as an analytics interface over curated views and approved tools, then compare answer quality against BigQuery Conversational Analytics.
04 / Source Readiness Matrix

Each source gets a controlled access path.

The original document correctly listed many data sources, but the access pattern should shift from raw warehouse querying to source-specific APIs, MCP tools, and curated analytics surfaces.

Source Primary Agent Access Use It For Warehouse Role Notes
Comm Manager Existing API + MCP Calls, VTT transcripts, recaps, client emails, phone calls, agendas, customer timeline, meeting-created tasks. Curated extracts only if needed. No direct agent access to production Postgres. Use API/MCP tools for permissioned business operations.
Agency Framework GraphQL API + MCP Aggregated performance metrics, normalized client/account performance summaries. Can feed BigQuery for snapshots or trend reporting. Confirm whether this is the preferred source for rollups before duplicating metric logic elsewhere.
SearchAtlas API + MCP GBP heatmaps/geogrids, KRT/SERP tracking, keyword/rank data, LLM visibility, SEO metrics, local search performance. Historical snapshots or fallback extracts. Keep live SEO/local data routed to SearchAtlas unless a specific report needs warehouse history.
Slack Official Slack MCP or Real-time Search API Internal decisions, blockers, handoffs, escalations, project context, client escalation context. Avoid broad raw sync initially. Search grouped channels through the identity resolver. Fetch full threads and return confidence, matched alias, and permalink citations.
BigQuery HIP Analytics MCP/API + Google BigQuery MCP where useful Facebook Ads, Google Ads, TikTok, Snapchat, GSC, GA4 if confirmed, GHL tables, Supermetrics, relationship tables, historical snapshots. Primary analytics warehouse. Expose curated views, business tools, and selected data agents. Keep broad SQL restricted and scoped by customer/location.
ClickUp API/MCP + BigQuery sync Tasks, docs, agendas, project state, list/folder structure, custom fields. Reporting snapshots and relationship mapping. Fresh workflow questions may need API/MCP; reporting can use BigQuery snapshots.
GHL / Practice Beacon Existing sync/API + BigQuery CRM, contacts, opportunities, appointments, tasks, notes, SMS/text communication layer. Core reporting and relationship identity tables. Use locationId as the backbone, but add ownership, drift detection, aliases, inactive clients, and manual overrides.
Vanna AI Analytics chat MVP over curated views/tools Repeated analytics questions, streaming tables/charts, golden query memory, SQL pattern learning. Uses BigQuery/Postgres only through approved datasets or business tools. Useful accelerator, not the central source of truth. Validate enterprise support path before critical reliance.
05 / Client Identity + Alias Learning

Messy names need a resolver, not hope.

Slack threads, ClickUp tasks, emails, agendas, and CRM records will not use perfect client names. The platform needs a client identity resolver that maps shorthand, misspellings, doctor names, location names, former names, and system IDs back to a canonical customer and locationId.

Resolver Contract

Tools should accept canonical IDs

Required
preferred tool shape identity first
search_client_slack_context(
  locationId: string,
  topics: string[],
  lookbackDays: number
)

Internally expands:
- canonical client name
- aliases and shorthand
- doctor/location names
- common misspellings
- ClickUp and Comm Manager IDs
Self-Improving Pipeline

Let AI propose aliases, not silently approve them

Human Review
01
Detect candidatesScan Slack, ClickUp, Comm Manager, email subjects, agendas, and CRM records for recurring shorthand or spelling drift.
02
Score with evidenceRequire source count, nearby system IDs, matching doctors/locations, thread context, and recency before trusting a candidate.
03
Queue for approvalHigh confidence can be suggested for approval. Medium confidence can be used only as possible context. Low confidence is stored for later evidence.
04
Improve future retrievalApproved aliases expand future Slack, ClickUp, Comm Manager, GHL, and BigQuery searches without manual upkeep.
Confidence Band Required Evidence Runtime Behavior Storage Status
High Multiple systems agree, client/location context matches, and recent usage is consistent. Use normally, with citations and matched alias shown when useful. Eligible for approval or policy-based auto-approval.
Medium Likely phrase match, but only one strong source or incomplete contextual proof. Return as possible related context, not as confirmed fact. Review queue with evidence.
Low Weak text similarity or ambiguous abbreviation. Do not use in final answer unless the user explicitly asks to broaden search. Candidate only; wait for repeat evidence.
06 / Tool Catalog

Tools execute. Skills route.

A skill or instruction set should help the model choose the right tool and answer format. The tool itself should enforce parameters, permissions, source routing, and output shape. For common workflows, use composite tools instead of making the model manually chain several low-level tools.

Business-Level Tools

Preferred user-facing access

Default
prepare_client_call_brief

Composite workflow for call prep across comms, tasks, performance, CRM, Slack, and agenda context.

resolve_client_identity

Maps user input to canonical customerId, locationId, aliases, and confidence evidence.

get_paid_media_performance

Composite summary across active paid channels for a client/date range.

get_google_ads_performance

Channel-specific Google Ads metrics and trend context.

get_facebook_ads_performance

Facebook Ads spend, leads, conversion, and campaign performance.

get_tiktok_ads_performance

TikTok performance when the client has that service active.

get_snapchat_ads_performance

Snapchat performance scoped by mapped ad account IDs.

get_gsc_performance

Search Console query/page performance when confirmed in BigQuery or SearchAtlas.

get_ga4_traffic_summary

GA4 traffic summary, pending final source confirmation.

get_crm_hygiene_report

GHL/Practice Beacon hygiene checks for stale opportunities, missing fields, appointment drift, and owner issues.

get_client_timeline

Recent meetings, recaps, email threads, tasks, and relevant updates.

search_client_comms

Permissioned search across calls, recaps, client emails, and docs.

search_client_slack_context

User-scoped Slack context search using locationId, aliases, grouped channels, thread citations, and confidence labels.

get_searchatlas_local_visibility

GBP heatmap/geogrid and local search visibility.

get_searchatlas_krt_rankings

Website SERP/KRT keyword ranking performance.

Restricted SQL

Admin and data-team fallback

Guarded
tool contract admin only
run_readonly_bigquery_sql(
  sql: string,
  reason: string
)

Required enforcement:
- table allowlist
- SELECT-only parser
- dry-run cost check
- row and byte limits
- locationId scope when applicable
- sensitive-column blocklist
- full audit log

Do not make this the default user path. Most users should never need raw SQL. They should ask questions that map to business tools like paid media performance, client timeline, or SearchAtlas visibility.

Composite Tool Example

get_paid_media_performance can call Google Ads, Facebook Ads, TikTok, and Snapchat under one governed workflow.

Channel Tool Example

get_google_ads_performance should only touch Google Ads data and relevant relationship tables.

Skill Role

Skills decide which tool to use and how to explain the answer. They should not be trusted as the security layer.

07 / Metric Ownership

Stop duplicate answers before agents amplify them.

If multiple systems can answer the same business question, the agent needs a priority order. Otherwise two interfaces can give different answers for the same client and date range.

Question Type Preferred Source Fallback
Paid media rollups Agency Framework if normalized BigQuery / Supermetrics
Raw channel metrics BigQuery / Supermetrics Agency Framework
GBP geogrid/local visibility SearchAtlas Historical BigQuery snapshot if synced
Website SERP/KRT rankings SearchAtlas None unless exported
Client communication timeline Comm Manager ClickUp docs for linked agendas/recaps
Internal decisions/context Slack MCP / RTS API ClickUp comments/docs if relevant
Tasks/docs/agendas ClickUp + Comm Manager links BigQuery snapshot for reporting
Open Decision

Agency Framework vs. BigQuery for performance rollups

The team should confirm which system owns normalized client performance summaries. If Agency Framework already applies the correct definitions, use it as the preferred rollup source and keep BigQuery for raw channel metrics, historical snapshots, relationship tables, and deeper data-team analysis.

Source Priority Metric Definitions Agent Routing
08 / Call Prep Workflow

Composite tools should encode real team workflows.

Call prep is the clearest example. The agent should not manually improvise the process every time. A skill can route the workflow, but a composite tool should execute the repeatable data gathering with stable inputs, permissions, citations, and freshness labels.

Composite Tool

prepare_client_call_brief(locationId, meetingId)

Recommended
01
Resolve identity and meeting contextConfirm canonical client, locationId, active services, upcoming agenda doc, and user permissions.
02
Review prior customer contextPull last call recap, transcript highlights, unresolved follow-ups, recent emails, phone calls, and SMS themes.
03
Merge performance and CRM signalsUse Agency Framework or BigQuery for paid media, SearchAtlas for search visibility, and GHL for funnel/hygiene.
04
Find internal contextSearch grouped Slack channels and ClickUp work for blockers, decisions, handoffs, overdue tasks, and open website/SEO/paid media threads.
05
Draft agenda and briefingReturn client-ready agenda sections, internal talking points, risks, citations, and freshness/confidence labels.
Data Pulls

What the workflow touches

Parallelizable
Comm Manager

Last call, VTT highlights, recap, emails, phone calls, timeline.

ClickUp

Agenda doc, tasks, overdue work, linked post-call actions.

Agency Framework

Normalized performance rollups where authoritative.

BigQuery

Channel metrics, GHL tables, Supermetrics, snapshots.

GHL

Opportunities, appointments, CRM hygiene, SMS layer.

Slack

Grouped-channel decisions, blockers, handoffs, escalations.

SearchAtlas

GBP geogrids, KRT/SERP ranks, local visibility, SEO signals.

Identity Resolver

Aliases, locationId, account mappings, confidence evidence.

09 / Speed + Quality Strategy

Fast responses come from prepared data, not shallow answers.

The ideal system should feel fast, but quality matters more than raw latency. For 500+ clients, millions of emails, and thousands of calls, the answer is caching, snapshots, indexing, and progressive refreshes rather than forcing every user question through fresh deep retrieval.

Fast Path
Cached

Serve a recent client brief snapshot first when the workflow allows it, then refresh volatile sections in parallel.

Quality Path
Cited

Every generated answer should identify source records, source freshness, confidence, and gaps instead of presenting unsupported synthesis.

Deep Path
Async

For large email/call/Slack searches, return a useful brief quickly and continue deeper retrieval as a background refresh.

Use Case Primary Path Accelerator Avoid
Paid media question Business metric tool with source ownership rules. Vanna or BigQuery Conversational Analytics over curated views. Raw SQL over every marketing table.
Call prep Composite call prep tool with cached brief snapshot and live refresh. Unstructured retrieval index, parallel tools, source freshness labels. Model manually searching each system in an ad hoc order.
Ad hoc analyst question Curated BigQuery MCP/API with row, table, and cost controls. Vanna tool memory, golden queries, BigQuery data agents. Unrestricted production database credentials.
Internal blocker search Client-scoped Slack context tool using alias resolver and thread retrieval. Slack MCP, Slack Real-time Search API, or Gemini Enterprise federation if available. Trusting fuzzy matches without evidence or confidence labels.
10 / Governance

Permission checks live outside the model.

Prompts and skills improve behavior, but they are not security controls. The Gateway, APIs, MCP servers, and curated data layer must enforce the rules before records are fetched.

Identity

Authenticated user context

Every request carries who the user is, where the request came from, and which customer/location scope they can access.

Permissions

Tool and source allowlists

The Gateway exposes only the tools and data classes allowed for that user, interface, and workflow.

AI Context

Minimized model input

Tools should return the least sensitive useful data, with summaries and citations where possible.

Audit

Answer traceability

Log the question, selected tools, source records, permission decisions, and final answer metadata.

Preserve from the original plan

Keep
  • Data quality monitoring and backfill for GHL and Supermetrics.
  • Relationship tables like location_master, location_platform_map, client_services, and SearchAtlas mappings.
  • Permission matrix thinking for transcript, recording, recap, task, and client access levels.
  • Known caveats like missing transcripts, left joins, stale touchpoint refreshes, and GHL timestamp casting.

Change from the original plan

Replace
  • Replace direct LLM-to-Postgres with Comm Manager API/MCP tools.
  • Replace broad natural-language SQL with business-level analytics tools.
  • Add Slack as an internal context source using official Slack MCP or RTS API.
  • Add Agency Framework and SearchAtlas as first-class API/MCP sources.
11 / Implementation Phases

Build the access layer before scaling agent usage.

The phases keep the original plan's foundation-first sequencing, but move the user-facing AI work toward governed tools rather than unrestricted SQL generation.

Source inventory and data ownership

Confirm what lives in Comm Manager, Agency Framework, SearchAtlas, BigQuery, ClickUp, Slack, and GHL. Decide authoritative source per metric and data class.

Foundation

Identity, permissions, and Gateway policy

Wire authenticated user context into Zach's MCP Gateway, define role/location/service access, and enforce tool allowlists before any source calls.

Security

Client identity and alias resolver

Build the customer/location resolver, alias table, evidence scoring, and review workflow before relying on Slack, ClickUp, or email search for client context.

Identity

Business-level tool catalog

Implement approved tools for client timelines, paid media, channel performance, SearchAtlas visibility, Slack context, ClickUp tasks, GHL reporting, and call prep.

Tools

Unstructured retrieval and brief snapshots

Index calls, VTT transcripts, recaps, client emails, ClickUp docs, Slack threads, and meeting agendas. Generate cached client brief snapshots with citations and freshness.

AI Context

Analytics accelerators

Expose curated analytics tools over Facebook Ads, Google Ads, TikTok, Snapchat, GSC, GA4 if confirmed, GHL, relationship tables, and historical snapshots. Compare Vanna, BigQuery MCP, and Conversational Analytics for the chat MVP.

Analytics

Skills, test questions, audit logs, rollout

Define repeatable skills, test the top business questions, verify permission failures, review answer citations, add evals/tracing, and roll out internal-only before client-facing use.

Production
12 / References

External source notes.

These sources support the updated direction: use official APIs/MCPs and managed enterprise services where they help, while keeping the Gateway and business tools as the control point.

Slack

Use official Slack MCP or Slack Real-time Search API for user-scoped internal context retrieval.

SearchAtlas

Use SearchAtlas API/MCP as the live access path for local SEO, GBP heatmaps, KRT/SERP tracking, LLM visibility, and ranking data. Endpoint-level tools should be finalized from their API documentation.

Google Cloud

Use Google Cloud's BigQuery MCP, Conversational Analytics, MCP governance, and Agent Platform evaluation/logging where they accelerate governed analytics and retrieval.

Vanna AI

Use Vanna as an analytics chat accelerator for curated views, golden queries, and streaming tables/charts. Validate current enterprise support before treating it as platform-critical.

13 / Slack Message

Message to send with the document.

Use this with the HTML attachment so the team understands the architecture change and where Zach's MCP Gateway fits.

Hey team, I took a deeper pass on the data platform plan and attached an updated ION-branded version.

The main recommendation is a slight architecture shift: we should avoid having AI agents or chat interfaces connect directly to raw databases. Instead, everything should go through a governed access layer. Zach's MCP Gateway can be the front door, then agents use approved business-level tools backed by our existing APIs/MCPs.

That means Comm Manager, Agency Framework, SearchAtlas, Slack, ClickUp, BigQuery, GHL, and the Vanna chat MVP all become controlled tool surfaces. BigQuery still matters a lot, but mostly through an Analytics MCP/API, curated views, Google BigQuery MCP where useful, and tools like get_paid_media_performance, get_google_ads_performance, get_gsc_performance, etc. A raw SQL tool can still exist, but it should be restricted to admin/data-team workflows with allowlists, cost checks, location scoping, and audit logs.

The key point: the agent should understand business capabilities, not raw warehouse tables. Skills/instructions should help route questions to the right tools and define answer formats, while the Gateway/API/MCP layer enforces permissions and logs what happened. For workflows like call prep, I think we should define a composite tool that gathers last call context, tasks, ad/CRM performance, recent emails/SMS, Slack blocker context, and agenda prep in one governed workflow instead of making the model manually search everything every time.

I also added two important pieces: a client identity/alias resolver so Slack/ClickUp/email searches can handle shorthand and misspellings, and a GCP/Vanna acceleration layer. Google Cloud and Vanna can help us get live faster for analytics, search, evals, and chat UX, but they should sit inside the governed platform rather than replace it.

I would like us to review this direction before we build too far into the original natural-language-to-SQL-over-everything approach.