"Schemaless" Sources and Destinations
In order to run a sync, Airbyte requires a catalog, which includes a data schema describing the shape of data being emitted by the source. This schema will be used to prepare the destination to populate the data during the sync.
While having a strongly-typed catalog/schema is possible for most sources, some won't have a reasonably static schema. This document describes the options available for the subset of sources that do not have a strict schema, aka "schemaless sources".
What is a Schemaless Source?
Schemaless sources are sources for which there is no requirement or expectation that records will conform to a particular pattern. For example, in a MongoDB database, there's no requirement that the fields in one document are the same as the fields in the next, or that the type of value in one field is the same as the type for that field in a separate document. Similarly, for a file-based source such as S3, the files that are present in your source may not all have the same schema.
Although the sources themselves may not conform to an obvious schema, Airbyte still needs to know the shape of the data in order to prepare the destination for the records.
For these sources, during the discover
method, Airbyte offers two options to create the schema:
- Dynamic schema inference.
- A hardcoded "schemaless" schema.