storage
Storage YAML config describes the storage providers you wish to enable.
postgres
If you wish to store the data in a postgres database you can enable the postgres storage.
Internal tables
When rindexer is running with postgres it uses the database to manage some internal state including the network and contract last seen block
and cached records of the yaml so it can remove old indexes and foreign keys in the database. You can see those tables in a schema called rindexer_internal
and should never be modified manually.
Own connection string
If you are deploying the indexer or want to point to an external database you can supply your own
connection string, to do this you have to change/define it the .env
file.
DATABASE_URL=postgresql://[user[:password]@][host][:port][/dbname]
enabled
If postgres is enabled or not, if you do not wish to use postgres you can set this to false or remove postgres from the storage completely.
name: rETHIndexer
description: My first rindexer project
repository: https://github.com/joshstevens19/rindexer
project_type: no-code
networks:
- name: ethereum
chain_id: 1
rpc: https://mainnet.gateway.tenderly.co
storage:
postgres:
enabled: true
drop_each_run
rindexer will keep track of the last synced block for each contracts and events meaning when you start and stop the indexer
it will start from the last synced block. rindexer will also create tables and indexes for you again which could clash if you
are using rindexer to grab throw away data and want to start over each time you run it.
You can use drop_each_run
to drop all the data for the indexer before starting which will ensure you start fresh.
name: rETHIndexer
description: My first rindexer project
repository: https://github.com/joshstevens19/rindexer
project_type: no-code
networks:
- name: ethereum
chain_id: 1
rpc: https://mainnet.gateway.tenderly.co
storage:
postgres:
enabled: true
drop_each_run: true
disable_create_tables
If you do not wish for rindexer to create the database tables for you automatically you can set this to true. By default if will create the tables for you. When this is disabled it will not write the sql in the handlers for you either. This field is optional and can be ignored if you do not need it.
name: rETHIndexer
description: My first rindexer project
repository: https://github.com/joshstevens19/rindexer
project_type: no-code
networks:
- name: ethereum
chain_id: 1
rpc: https://mainnet.gateway.tenderly.co
storage:
postgres:
enabled: true
disable_create_tables: true
indexes
When you end up having a database which has a lot of data querying that can become slow, indexes can help speed up the queries and critical for the performance of the GraphQL server. By default rindexer lets you filter on any column even if it is not indexed but here you can define the common filtering you are going to use in your application.
rindexer sees the ABIs as the source of truth and allows you to map against the information you should know about, rindexer will generate all the SQL for you and naming based on this.
global_injected_parameters
rindexer will inject common parameters into the event tables for you:
contract_address
- The contract address of the eventtx_hash
- The transaction hash of the eventblock_number
- The block number of the eventblock_hash
- The block hash of the eventnetwork
- The network of the eventtx_index
- The transaction index of the eventlog_index
- The log index of the event
If you start seeing your queries being slow when using any of these to filter you can add them to the global_injected_parameters
and rindexer will apply on all tables it generates.
For example below I want to filter on the block number and network and my queries are slow so i can add this index:
name: rETHIndexer
description: My first rindexer project
repository: https://github.com/joshstevens19/rindexer
project_type: no-code
networks:
- name: ethereum
chain_id: 1
rpc: https://mainnet.gateway.tenderly.co
storage:
postgres:
enabled: true
indexes:
global_injected_parameters:
- block_number
- network
contracts
You can then define indexes for your contracts
name
As you can have multiple contracts in your project you have to map its name to the contracts so it can read the ABIs.
name: rETHIndexer
description: My first rindexer project
repository: https://github.com/joshstevens19/rindexer
project_type: no-code
networks:
- name: ethereum
chain_id: 1
rpc: https://mainnet.gateway.tenderly.co
storage:
postgres:
enabled: true
indexes:
contracts:
- name: LensHub
injected_parameters
This is the same as the global injected parameters but will only apply to the events of this contract.
name: rETHIndexer
description: My first rindexer project
repository: https://github.com/joshstevens19/rindexer
project_type: no-code
networks:
- name: ethereum
chain_id: 1
rpc: https://mainnet.gateway.tenderly.co
storage:
postgres:
enabled: true
indexes:
contracts:
- name: LensHub
injected_parameters:
- block_number
- network
events
You can define indexes for specific events in the contract. Events are tables and you can make this with the values of the ABI and rindexer will transform them into the SQL queries you need. An event can have multiple indexes.
name
The name of the event to apply the indexes to.
name: rETHIndexer
description: My first rindexer project
repository: https://github.com/joshstevens19/rindexer
project_type: no-code
networks:
- name: ethereum
chain_id: 1
rpc: https://mainnet.gateway.tenderly.co
storage:
postgres:
enabled: true
indexes:
contracts:
- name: LensHub
events:
- name: QuoteCreated
injected_parameters
This is the same as the global injected parameters but will only apply the single event.
name: rETHIndexer
description: My first rindexer project
repository: https://github.com/joshstevens19/rindexer
project_type: no-code
networks:
- name: ethereum
chain_id: 1
rpc: https://mainnet.gateway.tenderly.co
storage:
postgres:
enabled: true
indexes:
contracts:
- name: LensHub
events:
- name: QuoteCreated
injected_parameters:
- tx_hash
indexes
You can define your indexes here - this allows you to define many indexes for the same event.
We will use this ABI as an example as it has tuples as well as route inputs.
{
"anonymous": false,
"inputs": [
{
"components": [
{
"internalType": "uint256",
"name": "profileId",
"type": "uint256"
},
{
"internalType": "string",
"name": "contentURI",
"type": "string"
},
{
"internalType": "uint256",
"name": "pointedProfileId",
"type": "uint256"
},
{
"internalType": "uint256",
"name": "pointedPubId",
"type": "uint256"
},
{
"internalType": "uint256[]",
"name": "referrerProfileIds",
"type": "uint256[]"
},
{
"internalType": "uint256[]",
"name": "referrerPubIds",
"type": "uint256[]"
},
{
"internalType": "bytes",
"name": "referenceModuleData",
"type": "bytes"
},
{
"internalType": "address[]",
"name": "actionModules",
"type": "address[]"
},
{
"internalType": "bytes[]",
"name": "actionModulesInitDatas",
"type": "bytes[]"
},
{
"internalType": "address",
"name": "referenceModule",
"type": "address"
},
{
"internalType": "bytes",
"name": "referenceModuleInitData",
"type": "bytes"
}
],
"indexed": false,
"internalType": "struct Types.QuoteParams",
"name": "quoteParams",
"type": "tuple"
},
{
"indexed": true,
"internalType": "uint256",
"name": "pubId",
"type": "uint256"
},
{
"indexed": false,
"internalType": "bytes",
"name": "referenceModuleReturnData",
"type": "bytes"
},
{
"indexed": false,
"internalType": "bytes[]",
"name": "actionModulesInitReturnDatas",
"type": "bytes[]"
},
{
"indexed": false,
"internalType": "bytes",
"name": "referenceModuleInitReturnData",
"type": "bytes"
},
{
"indexed": false,
"internalType": "address",
"name": "transactionExecutor",
"type": "address"
},
{
"indexed": false,
"internalType": "uint256",
"name": "timestamp",
"type": "uint256"
}
],
"name": "QuoteCreated",
"type": "event"
}
event_input_names
You may want to index one field or you may which to use a composite that filter or sort by multiple columns.
single root field
Lets say i want to add an index for transactionExecutor
I look in the ABI for that field and i see its not in a tuple
and directly on the root of inputs so i take the input name and apply it to the yaml file.
{
...
"inputs": [
...
{
"indexed": false,
"internalType": "address",
"name": "transactionExecutor",
"type": "address"
},
...
],
"name": "QuoteCreated",
"type": "event"
},
name: rETHIndexer
description: My first rindexer project
repository: https://github.com/joshstevens19/rindexer
project_type: no-code
networks:
- name: ethereum
chain_id: 1
rpc: https://mainnet.gateway.tenderly.co
storage:
postgres:
enabled: true
indexes:
contracts:
- name: LensHub
events:
- name: QuoteCreated
indexes:
- event_input_names:
- transactionExecutor
This will create a SQL index like the below:
CREATE INDEX idx_quote_created_transaction_executor
ON lens_indexer_lens_hub_quote_created (transaction_executor);
tuple field
If you want to add an index in a field which is within a tuple you can do this easily by just mapping the object location.
Lets say i want to add an index on the quoteParams
referenceModule
field.
{
"anonymous": false,
"inputs": [
{
"components": [
...
{
"internalType": "address",
"name": "referenceModule",
"type": "address"
},
...
],
"indexed": false,
"internalType": "struct Types.QuoteParams",
"name": "quoteParams",
"type": "tuple"
},
...
],
"name": "QuoteCreated",
"type": "event"
},
I would just map this in the yaml file:
name: rETHIndexer
description: My first rindexer project
repository: https://github.com/joshstevens19/rindexer
project_type: no-code
networks:
- name: ethereum
chain_id: 1
rpc: https://mainnet.gateway.tenderly.co
storage:
postgres:
enabled: true
indexes:
contracts:
- name: LensHub
events:
- name: QuoteCreated
indexes:
- event_input_names:
- "quoteParams.referenceModule"
This will create a SQL index like the below:
CREATE INDEX idx_quote_created_quote_params_reference_module
ON lens_indexer_lens_hub_quote_created (quote_params_reference_module);
multiple indexed fields
You may want to index multiple fields if you are doing a filter or ordering on many fields. Composite indexes are supported in the SQL database and you can do this easily by just mapping the object location.
Lets say i want to add an index on the quoteParams
referenceModule
field alongside the transactionExecutor
.
{
"anonymous": false,
"inputs": [
{
"components": [
...
{
"internalType": "address",
"name": "referenceModule",
"type": "address"
},
...
],
"indexed": false,
"internalType": "struct Types.QuoteParams",
"name": "quoteParams",
"type": "tuple"
},
{
"indexed": false,
"internalType": "address",
"name": "transactionExecutor",
"type": "address"
},
],
"name": "QuoteCreated",
"type": "event"
},
I would just map this in the yaml file:
name: rETHIndexer
description: My first rindexer project
repository: https://github.com/joshstevens19/rindexer
project_type: no-code
networks:
- name: ethereum
chain_id: 1
rpc: https://mainnet.gateway.tenderly.co
storage:
postgres:
enabled: true
indexes:
contracts:
- name: LensHub
events:
- name: QuoteCreated
indexes:
- event_input_names:
- transactionExecutor
- "quoteParams.referenceModule"
This will create a SQL index like the below:
CREATE INDEX idx_quote_created_transaction_executor_quote_params_reference_module
ON lens_indexer_lens_hub_quote_created (transaction_executor, quote_params_reference_module);
relationships
You can define your relationships between events, this will add foreign keys to the database and also process them in the correct order. Note rindexer always optimises for speed unless told to do so, on historic data it will drop any foreign keys and run them concurrently, it then re-apply the relationships again before indexing the live data. If still want to only run once the other one has run you can look into the dependency events.
You can define many relationships in the same YAML file.
We will use these ABIs as an example as it has tuples as well as route inputs.
{
"anonymous": false,
"inputs": [
{
"components": [
{
"internalType": "uint256",
"name": "profileId",
"type": "uint256"
},
{
"internalType": "string",
"name": "contentURI",
"type": "string"
},
{
"internalType": "uint256",
"name": "pointedProfileId",
"type": "uint256"
},
{
"internalType": "uint256",
"name": "pointedPubId",
"type": "uint256"
},
{
"internalType": "uint256[]",
"name": "referrerProfileIds",
"type": "uint256[]"
},
{
"internalType": "uint256[]",
"name": "referrerPubIds",
"type": "uint256[]"
},
{
"internalType": "bytes",
"name": "referenceModuleData",
"type": "bytes"
},
{
"internalType": "address[]",
"name": "actionModules",
"type": "address[]"
},
{
"internalType": "bytes[]",
"name": "actionModulesInitDatas",
"type": "bytes[]"
},
{
"internalType": "address",
"name": "referenceModule",
"type": "address"
},
{
"internalType": "bytes",
"name": "referenceModuleInitData",
"type": "bytes"
}
],
"indexed": false,
"internalType": "struct Types.QuoteParams",
"name": "quoteParams",
"type": "tuple"
},
{
"indexed": true,
"internalType": "uint256",
"name": "pubId",
"type": "uint256"
},
{
"indexed": false,
"internalType": "bytes",
"name": "referenceModuleReturnData",
"type": "bytes"
},
{
"indexed": false,
"internalType": "bytes[]",
"name": "actionModulesInitReturnDatas",
"type": "bytes[]"
},
{
"indexed": false,
"internalType": "bytes",
"name": "referenceModuleInitReturnData",
"type": "bytes"
},
{
"indexed": false,
"internalType": "address",
"name": "transactionExecutor",
"type": "address"
},
{
"indexed": false,
"internalType": "uint256",
"name": "timestamp",
"type": "uint256"
}
],
"name": "QuoteCreated",
"type": "event"
}
contract_name
As you can have multiple contracts in your project you have to map its name to the contracts so it can read the ABIs.
name: rETHIndexer
description: My first rindexer project
repository: https://github.com/joshstevens19/rindexer
project_type: no-code
networks:
- name: ethereum
chain_id: 1
rpc: https://mainnet.gateway.tenderly.co
storage:
postgres:
enabled: true
relationships:
- contract_name: LensHub
event_name
The name of the event to apply the indexes to.
name: rETHIndexer
description: My first rindexer project
repository: https://github.com/joshstevens19/rindexer
project_type: no-code
networks:
- name: ethereum
chain_id: 1
rpc: https://mainnet.gateway.tenderly.co
storage:
postgres:
enabled: true
relationships:
- contract_name: LensHub
event_name: QuoteCreated
event_input_name
This can be a tuple object mapping or a single field which we explained both explained above.
Lets say we want to make QuoteCreated
events quoteParams.profileId
linked to something other profile id event.
{
"anonymous": false,
"inputs": [
{
"components": [
{
"internalType": "uint256",
"name": "profileId",
"type": "uint256"
},
...
],
"indexed": false,
"internalType": "struct Types.QuoteParams",
"name": "quoteParams",
"type": "tuple"
},
...
],
"name": "QuoteCreated",
"type": "event"
}
Lets add that field to the event_input_name
:
name: rETHIndexer
description: My first rindexer project
repository: https://github.com/joshstevens19/rindexer
project_type: no-code
networks:
- name: ethereum
chain_id: 1
rpc: https://mainnet.gateway.tenderly.co
storage:
postgres:
enabled: true
relationships:
- contract_name: LensHub
event_name: QuoteCreated
event_input_name: "quoteParams.profileId"
linked_to
Now we have to map what this referenced to.
contract_name
Define the contract name to link to.
name: rETHIndexer
description: My first rindexer project
repository: https://github.com/joshstevens19/rindexer
project_type: no-code
networks:
- name: ethereum
chain_id: 1
rpc: https://mainnet.gateway.tenderly.co
storage:
postgres:
enabled: true
relationships:
- contract_name: LensHub
event_name: QuoteCreated
event_input_name: "quoteParams.profileId"
linked_to:
- contract_name: LensHub
event_name
Define the event name to link to.
name: rETHIndexer
description: My first rindexer project
repository: https://github.com/joshstevens19/rindexer
project_type: no-code
networks:
- name: ethereum
chain_id: 1
rpc: https://mainnet.gateway.tenderly.co
storage:
postgres:
enabled: true
relationships:
- contract_name: LensHub
event_name: QuoteCreated
event_input_name: "quoteParams.profileId"
linked_to:
- contract_name: LensHub
event_name: ProfileMetadataSet
event_input_name
Map the event input name for it, this MUST match the same ABI type as the event_input_name type above.
{
"anonymous": false,
"inputs": [
{
"indexed": true,
"internalType": "uint256",
"name": "profileId",
"type": "uint256"
},
{
"indexed": false,
"internalType": "string",
"name": "metadata",
"type": "string"
},
{
"indexed": false,
"internalType": "address",
"name": "transactionExecutor",
"type": "address"
},
{
"indexed": false,
"internalType": "uint256",
"name": "timestamp",
"type": "uint256"
}
],
"name": "ProfileMetadataSet",
"type": "event"
}
name: rETHIndexer
description: My first rindexer project
repository: https://github.com/joshstevens19/rindexer
project_type: no-code
networks:
- name: ethereum
chain_id: 1
rpc: https://mainnet.gateway.tenderly.co
storage:
postgres:
enabled: true
relationships:
- contract_name: LensHub
event_name: QuoteCreated
event_input_name: "quoteParams.profileId"
linked_to:
- contract_name: LensHub
event_name: ProfileMetadataSet
event_input_name: profileId
That is it we have now linked the QuoteCreated
events quoteParams.profileId
to the ProfileMetadataSet
events profileId
.
name: rETHIndexer
description: My first rindexer project
repository: https://github.com/joshstevens19/rindexer
project_type: no-code
networks:
- name: ethereum
chain_id: 1
rpc: https://mainnet.gateway.tenderly.co
storage:
postgres:
enabled: true
relationships:
- contract_name: LensHub
event_name: QuoteCreated
event_input_name: "quoteParams.profileId"
linked_to:
- contract_name: LensHub
event_name: ProfileMetadataSet
event_input_name: profileId
You can read more about how this changes the GraphQL ability to query the data here.
clickhouse
If you wish to store the data in a clickhouse database with the no-code project you can enable the clickhouse storage.
Internal tables
When rindexer is running with clickhouse it uses the database to manage some internal state including the network and contract last seen block.
You can see those tables in a schema called rindexer_internal
and should never be modified manually.
Own connection string
If you are deploying the indexer or want to point to an external database you can supply your own
connection string, to do this you have to change/define it the .env
file.
CLICKHOUSE_URL="http://[host]:[port]"
CLICKHOUSE_DB="default"
CLICKHOUSE_USER="default"
CLICKHOUSE_PASSWORD="default"
enabled
If clickhouse is enabled or not, if you do not wish to use clickhouse you can set this to false or remove clickhouse from the storage completely.
name: rETHIndexer
description: My first rindexer project
repository: https://github.com/joshstevens19/rindexer
project_type: no-code
networks:
- name: ethereum
chain_id: 1
rpc: https://mainnet.gateway.tenderly.co
storage:
clickhouse:
enabled: true
drop_each_run
rindexer will keep track of the last synced block for each contracts and events meaning when you start and stop the indexer it will start from the last synced block. rindexer will also create tables for you again which could clash if you are using rindexer to grab throw away data and want to start over each time you run it.
You can use drop_each_run
to drop all the data for the indexer before starting which will ensure you start fresh.
name: rETHIndexer
description: My first rindexer project
repository: https://github.com/joshstevens19/rindexer
project_type: no-code
networks:
- name: ethereum
chain_id: 1
rpc: https://mainnet.gateway.tenderly.co
storage:
clickhouse:
enabled: true
drop_each_run: true
disable_create_tables
If you do not wish for rindexer to create the database tables for you automatically you can set this to true. By default if will create the tables for you. When this is disabled it will not write the sql in the handlers for you either. This field is optional and can be ignored if you do not need it.
It will still create the rindexer internal tables for tracking last known block.
name: rETHIndexer
description: My first rindexer project
repository: https://github.com/joshstevens19/rindexer
project_type: no-code
networks:
- name: ethereum
chain_id: 1
rpc: https://mainnet.gateway.tenderly.co
storage:
clickhouse:
enabled: true
disable_create_tables: true
indexes
Clickhouse does not use indexes like other databases, instead the order by clause on the storage engine is critical for performance.
By default we use an order by clause of:
ORDER BY (network, block_number, tx_hash, log_index)
This allows efficient searches of per network block ranges, and is a valid "uniqueness" constraint meaning it allows us to ensure the data indexed does not contain duplicates. It also works with or without timestamps being enabled.
To assist with performance on common queries we automatically opt-in tables to minmax indexes on block_number
,
block-timestamp
and add bloom filters for tx_hash
and network
. This will allow fast queries for any generic block
pruning query, or transaction lookup.
index idx_block_num (block_number) type minmax granularity 1
index idx_timestamp (block_timestamp) type minmax granularity 1
index idx_network (network) type bloom_filter granularity 1
index idx_tx_hash (tx_hash) type bloom_filter granularity 1
additional information
You could add custom indexes on fields like from
or to
as needed. However indexing in OLAP databases is a complex topic,
if you wish to hyper-optimise for some specific query patterns such as a particular field like a wallet address it is more
appropriate to leverage the Rust Project and custom tables where you can control the order by to index on your primary
filter constraint first.
An example of this is erc20 transfers where we want to search quickly on a wallet address. In this case it is most benefical
to either create a projection, or to denormalize
the from and to inserts directly into a unified wallet_address
table with a direction
field.
An example of this would be as follows and would allow extremely optimised wallet_address = ?
in block timestamp descending queries:
create table if not exists erc20_transfer
(
block_timestamp DateTime('UTC'),
block_number UInt64,
network_id UInt32,
transaction_index UInt16,
log_index UInt16,
currency_address FixedString(20),
wallet_address FixedString(20),
counterparty_address FixedString(20),
transaction_hash FixedString(32),
amount UInt256,
is_send Bool
)
engine = ReplacingMergeTree
order by (wallet_address, block_timestamp, transaction_hash, log_index);
csv
If you wish to store the data in a CSV files you can enable the csv storage.
Last synced block state
When indexing with csv and postgres is disabled rindexer keeps the network and contract last seen block in a txt file within the
defined path the csv files will be written to, this is to ensure that if the indexer goes down it can pick up where it left off.
You can see those txt files under the csv path and in the contract names folder there is a folder called last-synced-blocks
, each event will have
a txt file with the last seen block. If you are using csv and postgres is enabled the last seen block will be stored in the database.
enabled
If csv is enabled or not, if you do not wish to use csv you can set this to false or remove csv from the storage completely.
name: rETHIndexer
description: My first rindexer project
repository: https://github.com/joshstevens19/rindexer
project_type: no-code
networks:
- name: ethereum
chain_id: 1
rpc: https://mainnet.gateway.tenderly.co
storage:
csv:
enabled: true
path
The path to store the CSV files, it should be a directory path, if it does not exist it will be created in the project directory
in folder called generated_csv
.
name: rETHIndexer
description: My first rindexer project
repository: https://github.com/joshstevens19/rindexer
project_type: no-code
networks:
- name: ethereum
chain_id: 1
rpc: https://mainnet.gateway.tenderly.co
storage:
csv:
enabled: true
path: ./generated_csv
disable_create_headers
If you do not wish for rindexer to create csv headers for you automatically you can set this to true. By default if will create the csv headers for you. When this is disabled it will not write the csv code in the handlers for you either. This field is optional and can be ignored if you do not need it.
name: rETHIndexer
description: My first rindexer project
repository: https://github.com/joshstevens19/rindexer
project_type: no-code
networks:
- name: ethereum
chain_id: 1
rpc: https://mainnet.gateway.tenderly.co
storage:
csv:
enabled: true
path: ./generated_csv
disable_create_headers: true
Multiple Storage Providers
You can have multiple storage providers in the YAML file.
name: rETHIndexer
description: My first rindexer project
repository: https://github.com/joshstevens19/rindexer
project_type: no-code
networks:
- name: ethereum
chain_id: 1
rpc: https://mainnet.gateway.tenderly.co
storage:
postgres:
enabled: true
csv:
enabled: true