MongoDB
Preparation
To allow Shaped to connect to your MongoDB database, you need to create a read-only user and share those credentials through the Create Dataset endpoint. You can create a read-only user on a MongoDB database instance with the following commands:
# 1. Connect to MongoDB instance and switch to the database.
mongo --host <host> --port <port> --username <username> --password <password>
use <database_name>
# 2. Create a user with read-only access to the database.
db.createUser({
user: "read_only_user",
pwd: "<password>",
roles: [{ role: "read", db: "<database_name>" }]
});
# 3. Grant Access to Specific Collection
db.createRole({
role: "read_only_collection_role",
privileges: [
{
resource: {
db: "<database_name>",
collection: "<collection_name>"
},
actions: ["find"]
}
],
roles: []
});
In MongoDB, read-only access at the database level will automatically grant read access to all collections in the database. If you want to restrict access to a specific collection, you will need to create a custom role.
db.grantRolesToUser("read_only_user", ["read_only_collection_role"]);
Replace host, port, username, password, database_name, and collection_name with the appropriate values for your MongoDB account.
Network Access
If your database is publicy accessible, reach out to the Shaped team to get our IPs for allow listing. Otherwise, check out our private link docs.
Dataset Configuration
Required fields
Field | Example | Description |
---|---|---|
schema_type | MONGODB | Specifies the connector schema type, in this case "MongoDB". |
mongodb_connection_string | mongodb://user:password@host:port/database | The connection string for your MongoDB database, including the username, password, host, port, and database name. |
collection | movies | The name of the MongoDB collection to sync. |
database | movielens | The name of the database that contains the collection to sync. |
Optional fields
Field | Example | Description |
---|---|---|
start_date | 2024-01-01 | The date from which to start syncing data. If not provided, the connector will sync all data from the collection. |
replication_key | created_at | The field to use as the replication key. If not provided, the connector will use the document _id field. |
replication_mode | INCREMENTAL or FULL_COLLECTION | The replication model to use. INCREMENTAL will only sync newly added records to the collection ordering by _id , while FULL_COLLECTION will read all records from the collection and deduplicate on _id upon each run. If not provided, the connector will use the INCREMENTAL model. |
Dataset Creation Example
Below is an example of a MongoDB dataset connector configuration:
name: mongodb_dataset
schema_type: MONGODB
collection: movies
database: movielens
mongodb_connection_string: mongodb://user:password@host:port/database
start_date: "2024-01-01"
The following payload will create a MongoDB dataset and begin syncing data from Shaped using the Shaped CLI.
shaped create-dataset --file dataset.yaml
How Shaped ingests MongoDB documents
As MongoDB data is schemaless, Shaped will convert the BSON documents into a JSON structure, saved into the dataset document
column, with associated metadata columns for the replication_key
comprised of document _id
and document created time, and namespace
containing collection and database names.
When creating a Shaped model from a MongoDB dataset, you can use the document
column to access the raw JSON document data, with JSON Extraction DuckDB functions to extract specific fields from the document.