Glossary
    • Dark
      Light

    Glossary

    • Dark
      Light

    Article Summary

    A

    Access - What you are allowed to do/view and what resources you have access to in Matillion ETL. This is managed on an individual basis, through Roles and Permissions.

    Active Directory - A server which authenticates and authorizes users and resources on a domain network, storing information about each user and device, and validates credentials and access rights. It manages groups and users, and policy administration. Active Directory is underpinned by LDAP (uses its protocol).

    Amazon Redshift - A cloud database warehouse with high availability, scalability, and speed. Used to store large volumes of data.

    Amazon S3 - Amazon Simple Storage Service provides object storage via the internet. An S3 bucket is a cloud storage resource for data.

    Amazon Web Services (AWS) - A collection of services that relate to cloud computing, including computing, storage, and development.

    API - An Application Programming Interface allows two applications to communicate with each other, such as a mobile device pulling weather information from a server, or Uber using a map service.

    API Profile - A specification of metadata for accessing an API endpoint, which will invoke an external API. In Matillion ETL, an API Profile can be either an Extract Profile or a Query Profile.

    Application protocol - A sets of rules that determine how processes on clients and servers communicate with each other, and how they run on different systems.

    Audit log - An Enterprise-only feature, that provides a log of your activity within a Matillion ETL instance.

    Authentication - Proves your identity through user credentials.

    Authorization - Authorizes you to access certain resources.

    Azure Blob Storage - Storage service for large amounts of unstructured data.

    Azure Portal - A console that allows you to control your web apps and cloud deployment.

    Azure Synapse - A cloud data warehouse and analytics platform with high availability, scalability, and speed. Used to store large volumes of data.


    B

    Backups - Additional copies of your data/workflow in Matillion ETL.

    Basic mode - Builds a query for you in Matillion ETL, without the need to write code.

    Bearer Token - A type of access token used with OAuth 2.0, commonly used in APIs.

    Bucket - In cloud storage, buckets are containers that hold your data.


    C

    CDC - Change Data Capture, only available for Redshift and Snowflake for AWS. It uses AWS DMS (Data Management Service) and S3 to check for updates to the source database and update the relevant tables within Matillion ETL.

    Cloud data lake - A cloud-hosted repository where you can store structured and unstructured data at any scale, as well as images, music, and videos.

    Cloud data warehouse - Stores data on machines around the world, up to petabytes of data. Unlike on-premise warehouses, which need to be managed and upgraded, with a cloud data warehouse you pay for what you use.

    Cloud storage - Where data is stored on the internet, across multiple servers, often across multiple locations. Data is always available and accessible.

    Clusters - Using two or more storage servers in tandem to enhance performance, availability, and capacity.

    Commit - A snapshot of your repository, in Git, for example. Each time you make a change and commit, it creates a new snapshot.

    Connection - A link to a node. Matillion ETL allows multiple connections to the same node.

    Connector - A connector is a data source that is used to connect to Matillion ETL.

    Console - A developer tool, where you can enter commands and see the output, and configure settings.

    Credentials - Credentials authorize you to work with platform-specific services in your Matillion ETL account. Instance credentials are tied to the instance hosting the client. These must be edited from within the client. User-defined credentials are set by the user in the Manage Credentials menu.

    CRUD - The four basic operations that are used in database applications: Create, Read, Update, and Delete.

    CSRF - Cross-Site Request Forgery is an attack that makes the end user perform unwanted actions on a web application. Matillion ETL instances are secured by CSRF tokens that refresh automatically.


    D

    Data - Information, facts, images, sound, and video, which is stored on a computer and can be manipulated and analyzed.

    Data lineage - The progress of data, from creation to transformations over time.

    Data model - The structure and model of a database: how data is connected together, processed, and stored in a system.

    Data source - Where data is loaded from to be staged and transformed in Matillion ETL, for example Facebook Query Component.

    Database drivers - A type of software that lets you 'talk' to your database. It links an application to a database management system.

    Databricks Delta Lake - A cloud-hosted repository where you can store structured and unstructured data at any scale, as well as images, music, and videos.

    Dataset - A collection of data.

    Debug - Testing and removing errors, often using a debugger tool.

    Destructive - Working with a component could potentially destroy or tamper with existing data.

    Docker - A platform that simplifies building, running, managing and deploying applications in an isolated environment, away from your infrastructure.

    Document database - Uses JSON-like documents to model data instead of rows and columns, designed to store and manage document-oriented information, referred to as semi-structured data.


    E

    Elastic Cloud Compute (EC2) - Provides compute capacity in the AWS cloud, including virtual servers, networking, and managing storage.

    Endpoint - One end of an API path that allows you to perform a certain function, for example /export typically exports data from the path you are on.

    Enterprise mode - An upgrade and extension of Matillion ETL's functionality. Additional features include: Git integration, generated documentation, audit log, permissions, data lineage, and concurrent connections.

    Environment - The way your Matillion ETL instance connects to a single connection to a database. Numerous environments can be set up at a user level, meaning different users can use different environments on a single Matillion ETL instance.

    ETL/ELT - Extract, Load and Transform. The process of extracting data from a data source, and transforming it. Matillion ETL transforms and loads data in one step.

    Export - To save a copy of the current document, instance, log, image, user details, etc. in another file format for future use. In Matillion ETL, jobs, environments, and variables can be exported from one Matillion ETL instance to another.

    Extract Profile - An API Profile that will invoke an external API and return un-flattened or nested data in the form of a single-column JSON structure. Users must flatten or un-nest this data using the API Extract component and either the Extract Nested Data component or the Nested Data Load component (CDW dependent).

    External tables - Tables that are outside the database. Matillion ETL allows you to write new external tables with a column mapping of your choice.

    Extract profile - The API Extract component allows you to load data from any accessible JSON API.


    F

    FIFO - In messaging, stands for first-in, first-out. The order of messages that are sent and received is preserved, and are delivered exactly one time.

    FTP - File Transfer Protocol defines how files are sent to each other. FTP also acts like a program, sharing files between remote computers.


    G

    Generated documentation - An Enterprise feature that provides a downloadable report that details the job layout, components, SQL, and properties.

    Git - A version control system that lets you track your file changes, and allows you to revert to previous versions. Multiple people can collaborate on one source.

    Google BigQuery - A cloud data warehouse with high availability, scalability, and speed. Used to store large volumes of data.

    Google Cloud Platform (GCP) - A collection of services that relate to cloud computing, including computing, storage, and development.

    Graph database - A type of NoSQL database that shows the connections between data, and treats the relationships as being as important as the data itself.

    Groups - Admins can specify who can access various parts of Matillion ETL, and define sets of permissions and users, which can be grouped.


    H

    Historical jobs - An Enterprise feature that allows you to view a job that was set up at the time of running.


    I

    Identity and Access Management (IAM) - The management of policies which are attached to IAM identities (users, groups of users, or roles).

    IaaS - A computing infrastructure, managed through the internet by a cloud computing service provider, such as Amazon Web Services (AWS). Cloud servers are provided to clients through an API or dashboard interface, so they have full control over the infrastructure. IaaS is similar to a traditional data center, but it is "outsourced" through a virtual data center in the cloud so the consumer does not need to physically maintain it their own data center.

    Import - To bring a file from outside an application, to be read and used by that application.

    Indexes - Used to find data quickly without having to search for each row. It improves database performance by reducing the number of disk accesses needed during a query. Indexes are created using a few columns: a search key that contains the Primary Keys, and the data reference column which points to where a particular key is found. Developers can also create indexes on functions or expressions.

    Instance - A copy of a running program that has been loaded into memory, such as Matillion ETL.

    Iterator - Running a job repeatedly, but with different parameter values each time. An iterator connects to a component and repeatedly executes that component over a set of values.


    J

    JDBC driver - A Java Database Connectivity driver is a Java API that manages connections to a database, issuing queries and commands to that database, and handling result sets obtained from the database.

    Jobs - Jobs are Matillion ETL's main way of designing, organising and executing workflows.

    JSON - A open standard text file format. JSON was created to hold structured data to be used in JavaScript. It's used for different applications, and is the most popular way of sending data for web APIs.


    K

    Key-value database - Key-value databases save data as a group of key-value pairs made up of two data items each. They're also sometimes referred to as a key-value store.


    L

    LDAP - Lightweight Directory Access Protocol is a protocol used for directory services authentication, including Active Directory. LDAP authentication uses two methods: simple, and SASL (Simple Authentication and Security Layer).

    Licence - BYOL versions of Matillion ETL obtained from online marketplaces have no built-in limits. A licence key is required immediately before the software can be used.

    Limit - Setting a range to prevent very large API calls from running and causing errors.


    M

    Metadata - Metadata is data about data. It makes it easy to find and work with certain datasets.

    Migrating - Moving Matillion ETL resources from one Matillion ETL instance to another.


    N

    NoSQL database - NoSQL databases are document structured, rather than tabular like many other databases. NoSQL databases are more scalable and more suited for very large amounts of data compared to relational databases. There is no limitation on the types of data that can be stored together. They are not organized in tables with rows and columns, but are grouped in collections of categories such as users and orders, for example.

    Notes - A feature in Matillion ETL to annotate your jobs, using formatted text boxes.


    O

    OAuth - A popular open standard protocol that provides authorization to web and desktop applications, mobile phones, and smart devices.

    Object-oriented database - In an OODBMS, information is represented as objects used in object-oriented programming. An object database is a hybrid of object-based data and relational databases.

    OpenID - An authentication protocol, that allows users to use a single username and password in a third-party service known as an identity provider. The identity provider confirms the user on the web sites they visit (known as the relying party). The user can also provide details such as biometrics or smart card information to authenticate.

    Orchestration - A type of component, which loads data from external sources.


    P

    Parameters - An element that defines a type of variable used with data. They let us send information or instructions.

    PaaS - PaaS provides a framework for developers to create customizable applications. A third-party provider or enterprise manages the servers, storage and networking, while developers maintain the applications. PaaS is similar to SaaS in delivery, but is geared towards creating software.

    Password - A set of characters that protects resources, or grants access to a particular thing.

    PATH - The unique URL to connect to the Matillion ETL API, e.g. https://<InstanceAddress>/rest/v1/<permission>.

    Permissions - Permissions define what features you enable or disable other users to use in Matillion ETL.

    Pipeline - Pipelines are configurable objects inside Matillion Data Loader that are responsible for loading data from a single source service to a single target data warehouse.

    Primary key - In a relational database, this identifies each unique record.

    Product Improvement Metrics - A system that gathers anonymous data. If enabled, this data will be reported to Matillion who will use it to build better products.

    Project - A logical grouping of configuration settings and resources (such as orchestration and transformation jobs).

    Properties - Metadata about your tables.

    Pub/Sub - A type of asynchronous communication used in serverless and micro services.


    Q

    Query component - A data source that's extracted, loaded, and transformed through Matillion ETL.

    Query profile - An API Profile that will invoke an external API and return the data flattened and structured in tables (columns and rows) according to an RSD file for use with the API Query component.


    R

    Read Only - A role that can be applied to a user, that allows read-only access. This role is currently only available in Matillion ETL instances created via the Billing Platform.

    Relational database - Relational databases essentially store data in tables, or 'relations', divided in rows (records) and columns (fields). SQL (Structured Query Language) is a language used for managing the database, including searching for, inserting, updating, and deleting records.

    Repository - A location where data is stored and managed, for example storing metadata in a particular location.

    Request - Sends a HTTP command to a server to communicate with the network. This could be to get some data or post some data.

    Resources - A collection of accessible entities within a computer system, such as files.

    Response - When making a HTTP request, the information sent back from the server is the response.

    REST - REpresentational State Transfer that defines the standards of communication between web computer systems.

    Roles - Different labels given to different users, for example, admin and owner.

    RSD file - A database file format that stores data in a structured format, which is queried by SQL commands.

    Runner - The runner stages data in an appropriate staging area. The runner then transforms this data on the target data warehouse.


    S

    SaaS - SaaS delivers applications to users, managed by a third-party vendor. SaaS applications usually work through a web browser, and don't require downloads or installation. Vendors manage the technical issues, like data, servers and storage. It reduces the time needed to install and upgrade software on the client side.

    Sample - Shows a preview of your data and allows you to see what will be returned when working with Matillion ETL.

    Schedules - Matillion ETL features a scheduler that will launch orchestration jobs automatically at a pre-defined, regular time interval. Schedules are set up against a project and you can have multiple schedules set up.

    Schemas - A schema defines the structure of something. A database schema is how data is organized and constructed (i.e. relational databases are divided into tables). It also describes the relationship between its entities.

    Script - A set of instructions that is carried out by another program; unlike a program, which uses a compiler.

    SDK - A software development kit provides the tools a developer needs to make software on a specified platform: tools, documentation, libraries, guides, code samples, etc.

    Secret key - A secret key acts like a password. It's a sequence of characters like passwords, account keys, and data connection strings.

    Sequences - Sequences allow users to create an automatically iterating value that can be loaded into tables.

    Server - A section of a computer's hardware or software, that enables other programs or devices to function.

    Shared jobs - Shared jobs allow users to bundle entire workflows into a single custom component and then use those custom components anywhere else in the project.

    Single sign-on - Single sign-on authenticates users to log in to a service with a single ID and password. One set of credentials can be used for multiple applications, for example, when a website allows you to use your Google or Facebook account to access a webpage. Single sign-on uses a central server that all applications trust.

    Snowflake - A cloud database warehouse with high availability, scalability, and speed. Used to store large volumes of data.

    SNS - Amazon SNS is a notification service that delivers messages in mass quantity, particularly to mobile devices. It uses the Pub/Sub model to push deliver messages.

    SOAP - Simple Object Access Protocol is a message protocol that can be carried over HTTP. SOAP is based on XML, a human-readable file format.

    Spectrum - Amazon Spectrum is a built-in feature found in Amazon Redshift that allows a user to perform complex data analysis on objects (like JSON files) stored in an Amazon S3 bucket.

    Stage - A data staging area is an intermediate storage area that sits between your data source and your data targets (for example your data warehouse). Data staging is a temporary state during which data is processed for extract, transform, and load operations.

    SQL - SQL stands for Structured Query Language, and is best used for structured data—data found in a fixed field, such as databases and spreadsheets. It's used for relational database management systems (RDBMS) and relational data stream management systems (RDSMS).

    SQS - Amazon Simple Queue Service (SQS) is a messaging service that stores messages between computers.

    String - A sequence of characters that could be a variable, or a literal constant.

    Structured data -Data that's held in a table form with relationships between the rows and columns. Examples include JSON and XML.


    T

    Task history - Shows completed tasks, whether successful or failed.

    Token - An object that's used to get access to a restricted resource. A token may be a key/password, signature, or biometric data.

    Transformation - A type of component that applies SQL queries to loaded data.

    Truncate - To shorten something.


    U

    User - A person who uses a computer or service.


    V

    Variable - Variables are name-value pairs stored within each environment. Variables can be used in parameters and expressions to allow the user to pass and centralize environment-specific configuration.

    Version - Versioning captures your development at a point in time to designate as "live" or "production", etc., and to then continue working on your default version.

    Views - Virtual tables based on the results of SQL statements, that are saved in the database as named queries. The virtual table itself can be queried.


    W

    Wildcard - A symbol used to represent or replace one or more characters. Wildcards are useful in searching through data.

    Writer - View, edit, and execute all parts of Matillion ETL, but not delete projects or versions.


    X

    XML - XML is a markup language that holds structured data. It uses tags that work like HTML. Tag names must only be letters, numbers, and underscores. XML was designed to be readable by both humans and machines, containing standard words within its tags. XML has one data type: String.


    What's Next