Subgraphs and Substreams-powered Subgraphs: Real-time Indexing for All Chains

Why Web3 Needs Indexed Data

Blockchain data is notoriously difficult to work with directly. It’s raw, nested, lacks structure, and it’s expensive to query repeatedly. For decentralized applications (dApps) to deliver real-time, user-facing features—like transaction histories, portfolio analytics, or protocol metrics—they need fast and flexible access to structured data.

That’s where The Graph ecosystem comes in. It offers tools to index blockchain data and make it available in an efficient, queryable format. But there’s a lot of confusion around the core components—Subgraphs, Substreams, and the newer hybrid model, Substreams-Powered Subgraphs. What exactly are they? How do they differ? And which should you use?

This post walks through each concept, explains its architecture, and shows where each shines (and where it doesn’t). Whether you’re building a DeFi dashboard, a GameFi leaderboard, or data infrastructure at scale—this is the practical deep dive.

Subgraphs: The Original Standard

Concept

A Subgraph is essentially a data indexing recipe for The Graph Node. It tells the system which smart contract events to watch, how to process the data from those events, and how to store the results in a PostgreSQL database that can be queried via GraphQL.

Subgraphs are declarative and powered by AssemblyScript, which is a subset of TypeScript that compiles to WebAssembly. This makes it approachable for frontend and smart contract developers, but limited in performance and expressiveness.

How It Works

The Graph Node connects to an Ethereum-compatible chain and listens for the on-chain events you've specified. Every time such an event is emitted, your AssemblyScript mappings process the data and save it into defined entities. These entities are stored in PostgreSQL and exposed as a GraphQL API.

Each entity is defined in a GraphQL schema, and each mapping is a function that processes event data and writes to those entities. All mapping logic runs in a single-threaded, per-block execution.

When to Use

Subgraphs are a solid choice for:

UIs and dashboards needing live data from smart contracts.
Projects with low to moderate indexing complexity.
Teams who want fast iteration with a low barrier to entry.

They aren't great for:

Re-indexing large chains.
Deep analytics with heavy computation.
Complex multi-contract data flows.

Sample Flow

You define a `Transfer` entity in `schema.graphql`.
You configure an event handler for `Transfer` events in `subgraph.yaml`.
You write an AssemblyScript function that creates a `Transfer` record every time an event is emitted.
The Graph Node saves that to PostgreSQL.

Once deployed, your frontend queries that data instantly using GraphQL.

Installation & Example


                    npm install -g @graphprotocol/graph-cli
                    graph init --from-contract `CONTRACT_ADDRESS` --network mainnet
                    cd 'subgraph-name'

`subgraph.yaml` defines the data source and mappings:


                    dataSources:
                    - kind: ethereum/contract
                        name: Token
                        source:
                        address: "0x..."
                        abi: Token
                        mapping:
                        kind: ethereum/events
                        language: wasm/assemblyscript
                        entities:
                            - Transfer
                        eventHandlers:
                            - event: Transfer(indexed address, indexed address, uint256)
                            handler: handleTransfer
                        file: ./src/mapping.ts

This sets up a pipeline from Ethereum logs → AssemblyScript mapping → PostgreSQL → GraphQL.

Substreams: A Modular, High-Performance Engine

Substreams is a Rust-based framework for high-performance, modular blockchain data processing. Built by StreamingFast, Substreams gives you full control over how blockchain data is extracted, processed, and outputted. Unlike Subgraphs, Substreams do not rely on The Graph Node at alland do not store data by default.

Instead, Substreams use Protobuf-based modules that stream data directly from a chain firehose. You build your own data pipelines with Rust functions—called `map` or `store` modules—and chain them together.

Substreams are composable, replayable, and parallelize. You can process blocks thousands of times faster than a Graph Node and output the result to a Sink of your choosing: a file, a database, a metrics system, or a Subgraph.

Data flows:

Chain → Substreams (Rust) → Protobuf Output → Sink

Sink Options

This is the key: Substreams needs a sink to do anything useful. Sinks are where your processed data goes.

Examples include:

substreams-sink-postgres: Store output in PostgreSQL (your own DB).
substreams-sink-prometheus: Expose metrics.
substreams-sink-kv: Write to a key-value store.
substreams-sink-graph: Feed output to a Subgraph.
substreams-sink-files: Output JSON or binary data to files or stdout.

When to Use

Substreams is ideal for:

High-throughput protocols (DeFi, NFTs, gaming).
Backends or data pipelines.
Data analytics and off-chain metrics.
Re-indexing from genesis at high speed.

They aren't great for:

You need a plug-and-play GraphQL API and don’t want to manage sinks.
Your team isn’t comfortable with Rust and Protobufs.

Sample Flow

Write a Rust function to extract transfers from a block.
Chain multiple map modules to transform or filter data.
Output as Protobuf.
Use a sink to send the data where you need it.

This design gives you complete flexibility and raw power, but it’s not a turnkey solution. You build the backend, and you choose what happens to the data.

Installation & Example

Install CLI:


                    curl -s https://substreams.streamingfast.io/install.sh | bash

Create new Substreams project:


                    substreams new my_project
                    cd my_project

Sample Rust module:


                    #[substreams::handlers::map]
                    fn map_transfers(params: String) -> Result {
                        // process raw block data and output transfers
                    }

Run it locally:


                    substreams run map_transfers \
                        --start-block 0 \
                        --stop-block +1000 \
                        --package substreams.yaml \
                        --output json

To write to a Postgres sink:


                    substreams-sink-postgres \
                        --config config.yaml \
                        --endpoint https://mainnet.eth.streamingfast.io \
                        --manifest substreams.yaml \
                        map_transfers

Substreams-Powered Subgraphs: The Hybrid Model

Concept

Substreams-powered Subgraphs combine the high-performance data extraction of Substreams with the convenient GraphQL querying interface of Subgraphs.

This hybrid approach lets you use Substreams to handle the heavy lifting—parallel block processing, filtering, transforming, deduping—and then pass the result into a Graph Node using a special sink.

From there, you get the best of both worlds:

Performance of Substreams
GraphQL API of Subgraphs

Your data flows is:

Chain → Substreams → Protobuf → Substreams Sink (Graph) → Graph Node → PostgreSQL → GraphQL

How It Works

You write and compile your Substreams modules in Rust.
You package the Protobuf definitions into a `.spkg` file.
In your Subgraph’s `subgraph.yaml`, you reference that `.spkg` file and specify which module to consume.
The Graph Node listens to the Substreams output and stores it like any other entity.
You can still use AssemblyScript to apply last-mile transformations, or map Protobufs directly to entities.

When to Use

Substreams is ideal for:

Teams who need both speed and a queryable API.
DApps with large or complex indexing needs.
Hybrid dev teams (Rust + TypeScript).
Replacing multiple Subgraphs with one scalable Substreams pipeline.

Sample Flow

`map_transfers.rs` in Rust emits a structured list of token transfers.
`substreams.spkg` is built and referenced in your Subgraph.
Entities like `Transfer` are created based on that data.
Graph Node stores everything in PostgreSQL and serves via GraphQL.

The end result: you get a fast, scalable indexing engine backed by a developer-friendly API layer.

Installation & Example

Write your Substreams in Rust: same as in substreams

Build your package:


                    substreams build

Reference Substreams in `subgraph.yaml`


                    specVersion: 0.0.5
                    features:
                        - substreams

                    substreams:
                        package: "./substreams.spkg"
                        module: map_transfers

Write mappings in AssemblyScript (optional):

Deploy to The Graph:


                    graph deploy --product hosted-service 'GITHUB_USER'/'SUBGRAPH_NAME'

Now your data flows like this:

Chain → Substreams (Rust) → Subgraph → PostgreSQL → GraphQL

Final Comparison

Feature	Subgraph	Substreams	Substreams-Powered Subgraph
Language	AssemblyScript	Rust	Rust + AssemblyScript (optional)
Performance	Low	High	Very High
Data Flow	Graph Node	Rust pipeline + Sink	Rust pipeline + Graph Node
Storage	PostgreSQL	Sink-defined (PostgreSQL, files, etc.)	PostgreSQL via Graph Node
Output API	GraphQL	Custom (DB, file, metrics)	GraphQL
Use Case	Simple dApps	Analytics / Infra / Pipelines	Scalable dApps with frontend APIs

Which Should You Use?

If you need a quick, queryable data source for a smart contract—start with a Subgraph.
If you’re building data infrastructure, analytics, or custom pipelines—go with Substreams.
If you need both, or you’ve hit the performance ceiling of a Subgraph—switch to a Substreams-powered Subgraph.

Think of it like this:

Subgraph are the fastest to build.
Substreams are the fastest to run.
Substreams-powered Subgraph scale like crazy *and* give you an API.