gitmyhub

pentaho-kettle

Java ★ 8.3k updated 1d ago

Pentaho Data Integration ( ETL ) a.k.a Kettle

Pentaho Data Integration (Kettle/PDI) is a Java ETL tool with a visual drag-and-drop designer for building data pipelines that extract, transform, and load data between databases, files, and web services.

JavaMavensetup: hardcomplexity 4/5

Pentaho Data Integration, also known as Kettle or PDI, is a tool for moving and transforming data between different systems. ETL stands for Extract, Transform, Load, which describes the basic idea: pull data out of one place, reshape or clean it, and put it somewhere else. This is a common task when combining data from multiple databases, migrating from one system to another, or preparing raw data for reporting and analysis.

The software has both a visual designer and a command-line engine. Users can build data pipelines by dragging and dropping steps in a graphical interface, connecting them to form a workflow that processes records row by row. The engine then runs those workflows, which can be scheduled or triggered programmatically. It supports connecting to databases, flat files, web services, and many other data sources.

This repository is the source code for the open-source community edition of the product. It is organized into several modules: a core library, the main execution engine, an engine extension layer, a database connection dialog, a user interface module, and a plugins folder that extends functionality. The codebase is built with Maven, a Java build tool, and requires Java 11.

Developers who want to build it from source run a standard Maven build command. The project includes unit tests and integration tests, and contributors are expected to attach pull requests to a Jira issue tracker. Code style is enforced with a checkstyle configuration included in the project.

The community forum for questions and support is hosted at the Hitachi Vantara community site, which now maintains the project.

Where it fits