A light, fast, and memory-efficient collection traversal library for Firestore and Node.js. Built by Proficient AI.
Firewalk is a Node.js library that walks you through Firestore collections.
When you have millions of documents in a collection and you need to make changes to them or just read them, you can't just retrieve all of them at once as your program's memory usage will explode. Firewalk's configurable traverser objects let you do this in a simple, intuitive and memory-efficient way using batch processing with concurrency control.
Firewalk is an extremely light and well-typed library that is useful in a variety of scenarios. You can use it in database migration scripts (e.g. when you need to add a new field to all docs) or a scheduled Cloud Function that needs to check every doc in a collection periodically or even a locally run script that retrieves some data from a collection.
Note: This library was previously known as Firecode. We're currently in the process of porting over the documentation from the previous site.
Firewalk on Google Dev Library ▸
Read the introductory blog post ▸
View the full documentation (docs) ▸
Firewalk is designed to work with the Firebase Admin SDK so if you haven't already installed it, run
# npm
npm install firebase-admin
# yarn
yarn add firebase-admin
Then run
# npm
npm install -E firewalk
# yarn
yarn add -E firewalk
There are only 2 kinds of objects you need to be familiar with when using this library:
Traverser: An object that walks you through a collection of documents (or more generally a Traversable).
Migrator: A convenience object used for database migrations. It lets you easily write to the documents within a given traversable and uses a traverser to do that. You can easily write your own migration logic in the traverser callback if you don't want to use a migrator.
Suppose we have a users
collection and we want to send an email to each user. This is how easy it is to do that efficiently with a Firewalk traverser:
import { value firestore } from 'firebase-admin';
import { value createTraverser } from 'firewalk';
const usersCollection = firestore().collection('users');
const traverser = createTraverser(usersCollection);
const { batchCount, docCount } = await traverser.traverse(async (batchDocs, batchIndex) => {
const batchSize = batchDocs.length;
await Promise.all(
batchDocs.map(async (doc) => {
const { email, firstName } = doc.data();
await sendEmail({ to: email, content: `Hello ${firstName}!` });
})
);
console.log(`Batch ${batchIndex} done! We emailed ${batchSize} users in this batch.`);
});
console.log(`Traversal done! We emailed ${docCount} users in ${batchCount} batches!`);
We are doing 3 things here:
users
collectioncreateTraverser()
function.traverse()
with an async callback that is called for each batch of document snapshotsThis pretty much sums up the core functionality of this library! The .traverse()
method returns a Promise that resolves when the entire traversal finishes, which can take a while if you have millions of docs. The Promise resolves with an object containing the traversal details e.g. the number of docs you touched.
const projectsColRef = firestore().collection('projects');
const traverser = createTraverser(projectsColRef, {
batchSize: 500,
// This means we are prepared to hold 500 * 20 = 10,000 docs in memory.
// We sacrifice some memory to traverse faster.
maxConcurrentBatchCount: 20,
});
const { docCount } = await traverser.traverse(async (_, batchIndex) => {
console.log(`Gonna process batch ${batchIndex} now!`);
// ...
});
console.log(`Traversed ${docCount} projects super-fast!`);
const projectsColRef = firestore().collection('projects');
const migrator = createMigrator(projectsColRef);
const { migratedDocCount } = await migrator.update('isCompleted', false);
console.log(`Updated ${migratedDocCount} projects!`);
type UserDoc = {
firstName: string;
lastName: string;
};
const usersColRef = firestore().collection('users') as firestore.CollectionReference<UserDoc>;
const migrator = createMigrator(usersColRef);
const { migratedDocCount } = await migrator.updateWithDerivedData((snap) => {
const { firstName, lastName } = snap.data();
return {
fullName: `${firstName} ${lastName}`,
};
});
console.log(`Updated ${migratedDocCount} users!`);
const projectsColRef = firestore().collection('projects');
const migrator = createMigrator(projectsColRef, { maxConcurrentBatchCount: 25 });
const { migratedDocCount } = await migrator.update('isCompleted', false);
console.log(`Updated ${migratedDocCount} projects super-fast!`);
const walletsWithNegativeBalance = firestore().collection('wallets').where('money', '<', 0);
const migrator = createMigrator(walletsWithNegativeBalance, {
// We want each batch to have 500 docs. The size of the very last batch may be less than 500
batchSize: 500,
// We want to wait 500ms before moving to the next batch
sleepTimeBetweenBatches: 500,
});
// Wipe out their debts!
const { migratedDocCount } = await migrator.set({ money: 0 });
console.log(`Set ${migratedDocCount} wallets!`);
const postsColGroup = firestore().collectionGroup('posts');
const migrator = createMigrator(postsColGroup);
const { migratedDocCount } = await migrator.renameField('postedAt', 'publishedAt');
console.log(`Updated ${migratedDocCount} posts!`);
You can find the full API reference for firewalk
here. We maintain detailed docs for every version! Here are some of the core functions that this library provides.
Creates an object which can be used to traverse a Firestore collection or, more generally, a Traversable.
For each batch of document snapshots in the traversable, the traverser invokes a specified async callback and immediately moves to the next batch. It does not wait for the callback Promise to resolve before moving to the next batch. That is, when maxConcurrentBatchCount
> 1, there is no guarantee that any given batch will finish processing before a later batch.
The traverser becomes faster as you increase maxConcurrentBatchCount
, but this will consume more memory. You should increase concurrency when you want to trade some memory for speed.
batchSize
) * (Q(batchSize
) + C(batchSize
) / maxConcurrentBatchCount
))maxConcurrentBatchCount
* (batchSize
* D + S))where:
batchSize
): average batch query timebatchSize
): average callback processing timeCreates a migrator that facilitates database migrations. The migrator accepts a custom traverser to traverse the collection. Otherwise it will create a default traverser with your desired traversal config. This migrator does not use atomic batch writes so it is possible that when a write fails other writes go through.
traverser
) where C(batchSize
) = W(batchSize
)traverser
) where S = O(batchSize
)where:
batchSize
): average batch write timetraverser
): time complexity of the underlying traversertraverser
): space complexity of the underlying traverserCreates a migrator that facilitates database migrations. The migrator accepts a custom traverser to traverse the collection. Otherwise it will create a default traverser with your desired traversal config. This migrator uses atomic batch writes so the entire operation will fail if a single write isn't successful.
traverser
) where C(batchSize
) = W(batchSize
)traverser
) where S = O(batchSize
)where:
batchSize
): average batch write timetraverser
): time complexity of the underlying traversertraverser
): space complexity of the underlying traverserThis project is still very new and we have a lot to work on. We will be moving fast and until we release v1, there may be breaking changes between minor versions (e.g. when upgrading from 0.4 to 0.5). However, all breaking changes will be documented and you can always use our Releases page as a changelog.
This project is made available under the MIT License.
Generated using TypeDoc