Why Web3 Needs Indexed Data

Blockchain data is notoriously difficult to work with directly. It’s raw, nested, lacks structure, and it’s expensive to query repeatedly. For decentralized applications (dApps) to deliver real-time, user-facing features—like transaction histories, portfolio analytics, or protocol metrics—they need fast and flexible access to structured data.

That’s where The Graph ecosystem comes in. It offers tools to index blockchain data and make it available in an efficient, queryable format. But there’s a lot of confusion around the core components—Subgraphs, Substreams, and the newer hybrid model, Substreams-Powered Subgraphs. What exactly are they? How do they differ? And which should you use?

This post walks through each concept, explains its architecture, and shows where each shines (and where it doesn’t). Whether you’re building a DeFi dashboard, a GameFi leaderboard, or data infrastructure at scale—this is the practical deep dive.

Subgraphs: The Original Standard

Concept

A Subgraph is essentially a data indexing recipe for The Graph Node. It tells the system which smart contract events to watch, how to process the data from those events, and how to store the results in a PostgreSQL database that can be queried via GraphQL.

Subgraphs are declarative and powered by AssemblyScript, which is a subset of TypeScript that compiles to WebAssembly. This makes it approachable for frontend and smart contract developers, but limited in performance and expressiveness.

How It Works

The Graph Node connects to an Ethereum-compatible chain and listens for the on-chain events you've specified. Every time such an event is emitted, your AssemblyScript mappings process the data and save it into defined entities. These entities are stored in PostgreSQL and exposed as a GraphQL API.

Each entity is defined in a GraphQL schema, and each mapping is a function that processes event data and writes to those entities. All mapping logic runs in a single-threaded, per-block execution.

When to Use
Subgraphs are a solid choice for:
They aren't great for:
Sample Flow
  1. You define a `Transfer` entity in `schema.graphql`.
  2. You configure an event handler for `Transfer` events in `subgraph.yaml`.
  3. You write an AssemblyScript function that creates a `Transfer` record every time an event is emitted.
  4. The Graph Node saves that to PostgreSQL.

Once deployed, your frontend queries that data instantly using GraphQL.

Installation & Example

                    npm install -g @graphprotocol/graph-cli
                    graph init --from-contract `CONTRACT_ADDRESS` --network mainnet
                    cd 'subgraph-name'
                

                    dataSources:
                    - kind: ethereum/contract
                        name: Token
                        source:
                        address: "0x..."
                        abi: Token
                        mapping:
                        kind: ethereum/events
                        language: wasm/assemblyscript
                        entities:
                            - Transfer
                        eventHandlers:
                            - event: Transfer(indexed address, indexed address, uint256)
                            handler: handleTransfer
                        file: ./src/mapping.ts
                

This sets up a pipeline from Ethereum logs → AssemblyScript mapping → PostgreSQL → GraphQL.

Substreams: A Modular, High-Performance Engine

Substreams is a Rust-based framework for high-performance, modular blockchain data processing. Built by StreamingFast, Substreams gives you full control over how blockchain data is extracted, processed, and outputted. Unlike Subgraphs, Substreams do not rely on The Graph Node at alland do not store data by default.

Instead, Substreams use Protobuf-based modules that stream data directly from a chain firehose. You build your own data pipelines with Rust functions—called `map` or `store` modules—and chain them together.

Substreams are composable, replayable, and parallelize. You can process blocks thousands of times faster than a Graph Node and output the result to a Sink of your choosing: a file, a database, a metrics system, or a Subgraph.

Data flows:

Sink Options

This is the key: Substreams needs a sink to do anything useful. Sinks are where your processed data goes.

Examples include:

When to Use
Substreams is ideal for:
They aren't great for:
Sample Flow
  1. Write a Rust function to extract transfers from a block.
  2. Chain multiple map modules to transform or filter data.
  3. Output as Protobuf.
  4. Use a sink to send the data where you need it.

This design gives you complete flexibility and raw power, but it’s not a turnkey solution. You build the backend, and you choose what happens to the data.

Installation & Example

                    curl -s https://substreams.streamingfast.io/install.sh | bash
                

                    substreams new my_project
                    cd my_project
                

                    #[substreams::handlers::map]
                    fn map_transfers(params: String) -> Result {
                        // process raw block data and output transfers
                    }
                

                    substreams run map_transfers \
                        --start-block 0 \
                        --stop-block +1000 \
                        --package substreams.yaml \
                        --output json
                

                    substreams-sink-postgres \
                        --config config.yaml \
                        --endpoint https://mainnet.eth.streamingfast.io \
                        --manifest substreams.yaml \
                        map_transfers
                

Substreams-Powered Subgraphs: The Hybrid Model

Concept

Substreams-powered Subgraphs combine the high-performance data extraction of Substreams with the convenient GraphQL querying interface of Subgraphs.

This hybrid approach lets you use Substreams to handle the heavy lifting—parallel block processing, filtering, transforming, deduping—and then pass the result into a Graph Node using a special sink.

From there, you get the best of both worlds:

Your data flows is:

How It Works
  1. You write and compile your Substreams modules in Rust.
  2. You package the Protobuf definitions into a `.spkg` file.
  3. In your Subgraph’s `subgraph.yaml`, you reference that `.spkg` file and specify which module to consume.
  4. The Graph Node listens to the Substreams output and stores it like any other entity.
  5. You can still use AssemblyScript to apply last-mile transformations, or map Protobufs directly to entities.
When to Use
Substreams is ideal for:
Sample Flow
  1. `map_transfers.rs` in Rust emits a structured list of token transfers.
  2. `substreams.spkg` is built and referenced in your Subgraph.
  3. Entities like `Transfer` are created based on that data.
  4. Graph Node stores everything in PostgreSQL and serves via GraphQL.

The end result: you get a fast, scalable indexing engine backed by a developer-friendly API layer.

Installation & Example

                    substreams build
                

                    specVersion: 0.0.5
                    features:
                        - substreams

                    substreams:
                        package: "./substreams.spkg"
                        module: map_transfers
                

                    graph deploy --product hosted-service 'GITHUB_USER'/'SUBGRAPH_NAME'
                

Now your data flows like this:

Final Comparison

Feature Subgraph Substreams Substreams-Powered Subgraph
Language AssemblyScript Rust Rust + AssemblyScript (optional)
Performance Low High Very High
Data Flow Graph Node Rust pipeline + Sink Rust pipeline + Graph Node
Storage PostgreSQL Sink-defined (PostgreSQL, files, etc.) PostgreSQL via Graph Node
Output API GraphQL Custom (DB, file, metrics) GraphQL
Use Case Simple dApps Analytics / Infra / Pipelines Scalable dApps with frontend APIs

Which Should You Use?

Think of it like this: