MongoDB
Preparation
To allow Shaped to connect to your MongoDB database, you need to create a read-only user and share those credentials through the Create Dataset endpoint. You can create a read-only user on a MongoDB database instance with the following commands:
# 1. Connect to MongoDB instance and switch to the database.
mongo --host <host> --port <port> --username <username> --password <password>
use <database_name>
# 2. Create a user with read-only access to the database.
db.createUser({
user: "read_only_user",
pwd: "<password>",
roles: [{ role: "read", db: "<database_name>" }]
});
# 3. Grant Access to Specific Collection
db.createRole({
role: "read_only_collection_role",
privileges: [
{
resource: {
db: "<database_name>",
collection: "<collection_name>"
},
actions: ["find"]
}
],
roles: []
});
Contact the Shaped team for the IP addresses to add to the allowlist.
In MongoDB, read-only access at the database level will automatically grant read access to all collections in the database. If you want to restrict access to a specific collection, you will need to create a custom role.
db.grantRolesToUser("read_only_user", ["read_only_collection_role"]);
Replace host, port, username, password, database_name, and collection_name with the appropriate values for your MongoDB account.
Dataset Configuration
Required fields
Field | Example | Description |
---|---|---|
schema_type | MONGODB | Specifies the connector schema type, in this case "MongoDB". |
mongodb_connection_string | mongodb://user:password@host:port/database | The connection string for your MongoDB database, including the username, password, host, port, and database name. |
collection | movies | The name of the MongoDB collection to sync. |
database | movielens | The name of the database that contains the collection to sync. |
Optional fields
Field | Example | Description |
---|---|---|
start_date | 2024-01-01 | The date from which to start syncing data. If not provided, the connector will sync all data from the collection. |
replication_key | created_at | The field to use as the replication key. If not provided, the connector will use the document _id field. |
replication_mode | INCREMENTAL or FULL_COLLECTION | The replication model to use. INCREMENTAL will only sync newly added records to the collection ordering by _id , while FULL_COLLECTION will read all records from the collection and deduplicate on _id upon each run. If not provided, the connector will use the INCREMENTAL model. |
Dataset Creation Example
Below is an example of a MongoDB dataset connector configuration:
name: mongodb_dataset
schema_type: MONGODB
collection: movies
database: movielens
mongodb_connection_string: mongodb://user:password@host:port/database
start_date: "2024-01-01"
The following payload will create a MongoDB dataset and begin syncing data from Shaped using the Shaped CLI.
shaped create-dataset --file dataset.yaml
How Shaped ingests MongoDB documents
As MongoDB data is schemaless, Shaped will convert the BSON documents into a JSON structure, saved into the dataset document
column, with associated metadata columns for the replication_key
comprised of document _id
and document created time, and namespace
containing collection and database names.
When creating a Shaped model from a MongoDB dataset, you can use the document
column to access the raw JSON document data, with JSON Extraction DuckDB functions to extract specific fields from the document.