Reference Card · Metadata API

Find orphaned datasources with active refreshes

A single paginated GraphQL query against the Metadata API that surfaces every published datasource with zero downstream workbooks but a still-running extract — the silent tax on your Tableau environment, made visible.

JUN 15, 2026

Every Tableau environment accumulates them: published datasources that nothing uses anymore, quietly refreshing on schedule. The workbooks that depended on them got deleted or repointed months ago, but the extract still wakes up every morning, hits the source database, and burns backgrounder capacity to refresh data no human will ever look at.

Nobody notices, because the failure is invisible — it's not an error, it's just waste. This card makes it visible with a single query: every published datasource with zero downstream workbooks that still has a live extract, so you can decommission with confidence instead of guessing.

The signal

An orphaned-but-refreshing datasource is the intersection of two facts, and the Metadata API knows both:

  1. No downstream workbooks — nothing in the environment consumes it. Lineage is the Metadata API's whole reason for existing: downstreamWorkbooks comes back empty.
  2. A live extracthasExtracts is true and extractLastRefreshTime is recent. A datasource refreshed last night with nothing downstream is the one burning compute for no one.

Neither fact alone is enough. No downstream workbooks and no extract is just dead weight — harmless. An extract with live downstream workbooks is working as intended. You want the overlap.

The query

One GraphQL query returns every published datasource with its lineage, extract status, owner, and last refresh. It's cursor-paginated from the start, so it scales from a 50-datasource site to a 50,000-datasource one without changing a line:

GraphQL
query OrphanedDatasources($cursor: String) {
  publishedDatasourcesConnection(first: 500, after: $cursor) {
    nodes {
      name
      luid
      projectName
      owner { name email }
      hasExtracts
      extractLastRefreshTime
      downstreamWorkbooks { luid }
    }
    pageInfo {
      hasNextPage
      endCursor
    }
    totalCount
  }
}

Run it once with cursor set to null. While pageInfo.hasNextPage is true, run it again with cursor set to the endCursor you just got back, and keep going until it's false. Each page hands you up to 500 nodes.

Reading the results

An orphan worth reviewing is any node where all three hold:

  • hasExtracts is true,
  • downstreamWorkbooks is empty ([]), and
  • extractLastRefreshTime is recent — within the last day or two means it's on a live schedule right now.

That's three field checks against the JSON; do them wherever you ran the query. owner.email is who to talk to before you touch anything, extractLastRefreshTime is the receipt that proves it's still costing you, and luid is the stable handle you'll use to actually find and decommission it.

Where to run it

It's just a query — run it wherever you already work:

  • GraphiQL — every Metadata-API-enabled site ships a playground at https://<your-tableau>/metadata/graphiql. Paste the query, set the cursor variable, page through by hand. Fastest way to eyeball a single site.
  • Postman / curlPOST https://<your-tableau>/api/metadata/graphql with an X-Tableau-Auth session token and a JSON body of { "query": "...", "variables": { "cursor": null } }.
  • Any script — sign in with a Personal Access Token in whatever language you like, loop the pages until hasNextPage is false, and filter the nodes. Twenty lines in Python or Node.

Pick the one that matches how often you'll run it. Eyeballing once: GraphiQL. Monthly cron: a script.

Before you delete anything

This finds candidates for review, not things to auto-delete. A few honest caveats:

  • Web authoring and live connections. A datasource consumed only through ad-hoc web authoring — never saved into a published workbook — can show zero downstream workbooks while still being used. Treat the list as a prompt to investigate, not a kill list.
  • Datasources published for reuse. Some certified datasources exist precisely so analysts can build on them. Zero downstream workbooks today might mean "newly published," not "abandoned." Check the create and refresh dates and the owner before acting.
  • Last refresh ≠ schedule. extractLastRefreshTime tells you the extract did refresh recently, not that a task is scheduled to. It's a strong proxy — something refreshed it last night — but if you want the actual schedule, cross-check the REST task list (/api/<ver>/sites/<id>/tasks/extractRefreshes).
  • Enablement. The Metadata API must be enabled on your site (on by default on Tableau Cloud and recent Server versions). If the endpoint 404s or errors, that's the first thing to check.

Run it monthly, decommission what survives review, and the silent tax goes away.