Why Web3 Needs Indexed Data
Blockchain data is notoriously difficult to work with directly. It’s raw, nested, lacks structure, and it’s expensive to query repeatedly. For decentralized applications (dApps) to deliver real-time, user-facing features—like transaction histories, portfolio analytics, or protocol metrics—they need fast and flexible access to structured data.
That’s where The Graph ecosystem comes in. It offers tools to index blockchain data and make it available in an efficient, queryable format. But there’s a lot of confusion around the core components—Subgraphs, Substreams, and the newer hybrid model, Substreams-Powered Subgraphs. What exactly are they? How do they differ? And which should you use?
This post walks through each concept, explains its architecture, and shows where each shines (and where it doesn’t). Whether you’re building a DeFi dashboard, a GameFi leaderboard, or data infrastructure at scale—this is the practical deep dive.
Subgraphs: The Original Standard
Concept
A Subgraph is essentially a data indexing recipe for The Graph Node. It tells the system which smart contract events to watch, how to process the data from those events, and how to store the results in a PostgreSQL database that can be queried via GraphQL.
Subgraphs are declarative and powered by AssemblyScript, which is a subset of TypeScript that compiles to WebAssembly. This makes it approachable for frontend and smart contract developers, but limited in performance and expressiveness.
How It Works
The Graph Node connects to an Ethereum-compatible chain and listens for the on-chain events you've specified. Every time such an event is emitted, your AssemblyScript mappings process the data and save it into defined entities. These entities are stored in PostgreSQL and exposed as a GraphQL API.
Each entity is defined in a GraphQL schema, and each mapping is a function that processes event data and writes to those entities. All mapping logic runs in a single-threaded, per-block execution.
When to Use
Subgraphs are a solid choice for:
- UIs and dashboards needing live data from smart contracts.
- Projects with low to moderate indexing complexity.
- Teams who want fast iteration with a low barrier to entry.
They aren't great for:
- Re-indexing large chains.
- Deep analytics with heavy computation.
- Complex multi-contract data flows.
Sample Flow
- You define a `Transfer` entity in `schema.graphql`.
- You configure an event handler for `Transfer` events in `subgraph.yaml`.
- You write an AssemblyScript function that creates a `Transfer` record every time an event is emitted.
- The Graph Node saves that to PostgreSQL.
Once deployed, your frontend queries that data instantly using GraphQL.
Installation & Example
npm install -g @graphprotocol/graph-cli
graph init --from-contract `CONTRACT_ADDRESS` --network mainnet
cd 'subgraph-name'
- `subgraph.yaml` defines the data source and mappings:
dataSources:
- kind: ethereum/contract
name: Token
source:
address: "0x..."
abi: Token
mapping:
kind: ethereum/events
language: wasm/assemblyscript
entities:
- Transfer
eventHandlers:
- event: Transfer(indexed address, indexed address, uint256)
handler: handleTransfer
file: ./src/mapping.ts
This sets up a pipeline from Ethereum logs → AssemblyScript mapping → PostgreSQL → GraphQL.
Substreams: A Modular, High-Performance Engine
Substreams is a Rust-based framework for high-performance, modular blockchain data processing. Built by StreamingFast, Substreams gives you full control over how blockchain data is extracted, processed, and outputted. Unlike Subgraphs, Substreams do not rely on The Graph Node at alland do not store data by default.
Instead, Substreams use Protobuf-based modules that stream data directly from a chain firehose. You build your own data pipelines with Rust functions—called `map` or `store` modules—and chain them together.
Substreams are composable, replayable, and parallelize. You can process blocks thousands of times faster than a Graph Node and output the result to a Sink of your choosing: a file, a database, a metrics system, or a Subgraph.
Data flows:
- Chain → Substreams (Rust) → Protobuf Output → Sink
Sink Options
This is the key: Substreams needs a sink to do anything useful. Sinks are where your processed data goes.
Examples include:
- substreams-sink-postgres: Store output in PostgreSQL (your own DB).
- substreams-sink-prometheus: Expose metrics.
- substreams-sink-kv: Write to a key-value store.
- substreams-sink-graph: Feed output to a Subgraph.
- substreams-sink-files: Output JSON or binary data to files or stdout.
When to Use
Substreams is ideal for:
- High-throughput protocols (DeFi, NFTs, gaming).
- Backends or data pipelines.
- Data analytics and off-chain metrics.
- Re-indexing from genesis at high speed.
They aren't great for:
- You need a plug-and-play GraphQL API and don’t want to manage sinks.
- Your team isn’t comfortable with Rust and Protobufs.
Sample Flow
- Write a Rust function to extract transfers from a block.
- Chain multiple map modules to transform or filter data.
- Output as Protobuf.
- Use a sink to send the data where you need it.
This design gives you complete flexibility and raw power, but it’s not a turnkey solution. You build the backend, and you choose what happens to the data.
Installation & Example
- Install CLI:
curl -s https://substreams.streamingfast.io/install.sh | bash
- Create new Substreams project:
substreams new my_project
cd my_project
- Sample Rust module:
#[substreams::handlers::map]
fn map_transfers(params: String) -> Result {
// process raw block data and output transfers
}
- Run it locally:
substreams run map_transfers \
--start-block 0 \
--stop-block +1000 \
--package substreams.yaml \
--output json
- To write to a Postgres sink:
substreams-sink-postgres \
--config config.yaml \
--endpoint https://mainnet.eth.streamingfast.io \
--manifest substreams.yaml \
map_transfers
Substreams-Powered Subgraphs: The Hybrid Model
Concept
Substreams-powered Subgraphs combine the high-performance data extraction of Substreams with the convenient GraphQL querying interface of Subgraphs.
This hybrid approach lets you use Substreams to handle the heavy lifting—parallel block processing, filtering, transforming, deduping—and then pass the result into a Graph Node using a special sink.
From there, you get the best of both worlds:
- Performance of Substreams
- GraphQL API of Subgraphs
Your data flows is:
- Chain → Substreams → Protobuf → Substreams Sink (Graph) → Graph Node → PostgreSQL → GraphQL
How It Works
- You write and compile your Substreams modules in Rust.
- You package the Protobuf definitions into a `.spkg` file.
- In your Subgraph’s `subgraph.yaml`, you reference that `.spkg` file and specify which module to consume.
- The Graph Node listens to the Substreams output and stores it like any other entity.
- You can still use AssemblyScript to apply last-mile transformations, or map Protobufs directly to entities.
When to Use
Substreams is ideal for:
- Teams who need both speed and a queryable API.
- DApps with large or complex indexing needs.
- Hybrid dev teams (Rust + TypeScript).
- Replacing multiple Subgraphs with one scalable Substreams pipeline.
Sample Flow
- `map_transfers.rs` in Rust emits a structured list of token transfers.
- `substreams.spkg` is built and referenced in your Subgraph.
- Entities like `Transfer` are created based on that data.
- Graph Node stores everything in PostgreSQL and serves via GraphQL.
The end result: you get a fast, scalable indexing engine backed by a developer-friendly API layer.
Installation & Example
- Write your Substreams in Rust: same as in substreams
- Build your package:
substreams build
- Reference Substreams in `subgraph.yaml`
specVersion: 0.0.5
features:
- substreams
substreams:
package: "./substreams.spkg"
module: map_transfers
- Write mappings in AssemblyScript (optional): If needed, to transform incoming data before storing.
- Deploy to The Graph:
graph deploy --product hosted-service 'GITHUB_USER'/'SUBGRAPH_NAME'
Now your data flows like this:
- Chain → Substreams (Rust) → Subgraph → PostgreSQL → GraphQL
Final Comparison
| Feature | Subgraph | Substreams | Substreams-Powered Subgraph |
|---|---|---|---|
| Language | AssemblyScript | Rust | Rust + AssemblyScript (optional) |
| Performance | Low | High | Very High |
| Data Flow | Graph Node | Rust pipeline + Sink | Rust pipeline + Graph Node |
| Storage | PostgreSQL | Sink-defined (PostgreSQL, files, etc.) | PostgreSQL via Graph Node |
| Output API | GraphQL | Custom (DB, file, metrics) | GraphQL |
| Use Case | Simple dApps | Analytics / Infra / Pipelines | Scalable dApps with frontend APIs |
Which Should You Use?
- If you need a quick, queryable data source for a smart contract—start with a Subgraph.
- If you’re building data infrastructure, analytics, or custom pipelines—go with Substreams.
- If you need both, or you’ve hit the performance ceiling of a Subgraph—switch to a Substreams-powered Subgraph.
Think of it like this:
- Subgraph are the fastest to build.
- Substreams are the fastest to run.
- Substreams-powered Subgraph scale like crazy *and* give you an API.