AccessLayer
Platform
Core Concepts

Collections

How connector data becomes queryable structures in AccessLayer.

This page explains the core idea behind querying in AccessLayer: connector data does not become queryable just because a connector exists. It becomes queryable because AccessLayer turns provider-specific APIs into structured collections and objects that the query engine and AI layer can reason about.

Why collections exist

Every provider API is different.

  • Stripe has resources such as customers, subscriptions, invoices, charges, prices, and refunds.
  • GitHub has issues, pull requests, commits, and members.
  • Notion has pages, databases, and users.

If AccessLayer exposed those APIs directly, every query path would need to understand provider-specific pagination, filters, naming, and response shapes.

Collections solve that by giving AccessLayer a stable, queryable unit of data. A collection defines:

  • where the data comes from
  • which query-time arguments it accepts
  • what the raw result looks like
  • which object schema describes each returned row

The simple mental model

At a high level, AccessLayer works like this:

A connector knows how to talk to a source system

It handles provider-specific authentication, endpoints, pagination, and request patterns.

AccessLayer maps source data into collections and objects

Collections define queryable entry points. Objects define the structure of the rows those collections return.

The engine queries those collections instead of raw provider APIs

That shared structure is what makes SQL generation, query planning, and AI grounding reliable.

What a collection actually is

A collection is usually a list-style endpoint that AccessLayer exposes as a SQL table function.

That means a collection is not:

  • a direct mirror of a provider's full API surface
  • a mutable resource definition
  • a random JSON blob with no schema

It is a query-oriented access point with metadata.

For example, a collection typically includes:

  • a stable ID such as stripe.collection.invoices
  • an argument schema such as customer, status, or limit
  • a result schema describing the provider response
  • a record or object schema describing the rows you can select from

Concrete example: Stripe

Stripe is a good example because the connector exposes a set of collections that map closely to common finance and billing workflows.

Some of the Stripe collections currently exposed by AccessLayer include:

  • stripe_customers Returns customers, ordered by creation time. Useful for customer counts, cohorts, and account-level joins.
  • stripe_subscriptions Returns subscriptions and supports filters such as status, customer, and price.
  • stripe_invoices Returns invoices across your Stripe account or for a specific customer.
  • stripe_charges Returns charges so you can analyze transaction volume and payment activity.
  • stripe_products Returns product catalog data.
  • stripe_prices Returns active price definitions, including recurring and one-time pricing.

AccessLayer also exposes additional Stripe collections such as invoice items, disputes, refunds, and coupons.

What this means in practice

Instead of thinking "I need to call several Stripe endpoints manually," you can think "I need the collection that represents the dataset I want to query."

For example:

SELECT *
FROM stripe_customers('abc123')
SELECT *
FROM stripe_customers('abc123', {'limit': 25})
SELECT *
FROM stripe_subscriptions(
  'abc123',
  {'status': 'active', 'limit': 25}
)
SELECT *
FROM stripe_invoices(
  'abc123',
  {'customer': 'cus_123', 'limit': 25}
)

In those examples:

  • stripe_customers, stripe_subscriptions, and stripe_invoices are collection functions
  • 'abc123' is the connector instance ID
  • the optional second argument is a DuckDB STRUCT literal with collection-specific arguments

Collections versus objects

Collections and objects are related, but they are not the same thing.

  • A collection is the entry point you query.
  • An object describes the structure of each row returned by that collection.

For example:

  • the Stripe invoices collection gives you a queryable invoices dataset
  • the Stripe invoice object describes fields such as totals, status, customer, due dates, and payment-related metadata

This separation matters because AccessLayer needs to know both:

  1. how to fetch the dataset
  2. how to describe the rows inside it

Why this helps AI and SQL generation

Collections give the platform grounding.

Instead of guessing that "Stripe probably has invoices somewhere," the AI layer can work from real metadata:

  • available collection names
  • supported arguments
  • result schemas
  • row shapes

That makes it much easier to generate valid SQL, ask better follow-up questions, and avoid hallucinating fields or filters that do not exist.

What makes a good collection

Good collections are chosen for analytics usefulness, not because every API endpoint deserves to be queryable.

A good collection usually has these properties:

  • it returns a list or tabular result
  • it has meaningful filters at query time
  • its row shape is predictable enough to model
  • it maps to questions users actually ask

For example, Stripe customers, subscriptions, invoices, charges, and prices are strong collection candidates because they support reporting and operational analysis well.

Endpoints that only mutate data or return highly specialized one-off responses are usually poor collection candidates.

Common misunderstandings

On this page