Skip to main content

MongoDB

Preparation

To allow Shaped to connect to your MongoDB database, you need to create a read-only user and share those credentials through the Create Dataset endpoint. You can create a read-only user on a MongoDB database instance with the following commands:

# 1. Connect to MongoDB instance and switch to the database.
mongo --host <host> --port <port> --username <username> --password <password>
use <database_name>

# 2. Create a user with read-only access to the database.
db.createUser({
user: "read_only_user",
pwd: "<password>",
roles: [{ role: "read", db: "<database_name>" }]
});

# 3. Grant Access to Specific Collection
db.createRole({
role: "read_only_collection_role",
privileges: [
{
resource: {
db: "<database_name>",
collection: "<collection_name>"
},
actions: ["find"]
}
],
roles: []
});
info

Contact the Shaped team for the IP addresses to add to the allowlist.

In MongoDB, read-only access at the database level will automatically grant read access to all collections in the database. If you want to restrict access to a specific collection, you will need to create a custom role.

db.grantRolesToUser("read_only_user", ["read_only_collection_role"]);

Replace host, port, username, password, database_name, and collection_name with the appropriate values for your MongoDB account.

Dataset Configuration

Required fields

FieldExampleDescription
schema_typeMONGODBSpecifies the connector schema type, in this case "MongoDB".
mongodb_connection_stringmongodb://user:password@host:port/databaseThe connection string for your MongoDB database, including the username, password, host, port, and database name.
collectionmoviesThe name of the MongoDB collection to sync.
databasemovielensThe name of the database that contains the collection to sync.

Optional fields

FieldExampleDescription
start_date2024-01-01The date from which to start syncing data. If not provided, the connector will sync all data from the collection.
replication_keycreated_atThe field to use as the replication key. If not provided, the connector will use the document _id field.
replication_modeINCREMENTAL or FULL_COLLECTIONThe replication model to use. INCREMENTAL will only sync newly added records to the collection ordering by _id, while FULL_COLLECTION will read all records from the collection and deduplicate on _id upon each run. If not provided, the connector will use the INCREMENTAL model.

Dataset Creation Example

Below is an example of a MongoDB dataset connector configuration:

name: mongodb_dataset
schema_type: MONGODB
collection: movies
database: movielens
mongodb_connection_string: mongodb://user:password@host:port/database
start_date: "2024-01-01"

The following payload will create a MongoDB dataset and begin syncing data from Shaped using the Shaped CLI.

shaped create-dataset --file dataset.yaml

How Shaped ingests MongoDB documents

As MongoDB data is schemaless, Shaped will convert the BSON documents into a JSON structure, saved into the dataset document column, with associated metadata columns for the replication_key comprised of document _id and document created time, and namespace containing collection and database names.

When creating a Shaped model from a MongoDB dataset, you can use the document column to access the raw JSON document data, with JSON Extraction DuckDB functions to extract specific fields from the document.