keep

Python ★ 12k updated 5d ago

The open-source AIOps and alert management platform

An open-source alert management platform that pulls alerts from Datadog, Grafana, PagerDuty, and other tools into one dashboard, with deduplication, AI correlation, and automated workflows that trigger Slack messages or Jira tickets.

PythonAnthropicOpenAIOllamaGeminisetup: moderatecomplexity 3/5

Keep is an open-source platform for managing alerts from multiple monitoring tools in one place. Operations teams typically use many different monitoring services such as Datadog, Grafana, PagerDuty, and CloudWatch, each sending their own alerts. Keep aggregates all of these into a single interface so engineers can see and respond to everything without switching between tools.

The core features include alert deduplication (combining multiple alerts about the same issue into one), correlation (grouping related alerts together so an incident with five symptoms appears as one cluster rather than five separate notifications), enrichment (adding extra context to alerts automatically), and filtering. Alerts can be acknowledged, snoozed, or routed to the right team through a customizable dashboard.

Keep also includes a workflow engine, described by the project as "GitHub Actions for your monitoring tools." Workflows are automated sequences that trigger when certain alert conditions are met, such as sending a message to Slack, creating a ticket in Jira, or running a script to restart a failing service. The integrations are bidirectional, meaning Keep can both receive alerts from external tools and push actions back to them.

AI features are built in through connections to several AI providers, including Anthropic, OpenAI, Gemini, DeepSeek, and local models via Ollama. These are used for tasks like summarizing an incident, correlating alerts that share a root cause, or gathering additional context automatically when an alert fires.

Keep is written in Python. A hosted version is available at platform.keephq.dev for trying it out, and documentation is at docs.keephq.dev. The full README is longer than what was shown.

Where it fits

Aggregate alerts from Datadog, Grafana, and PagerDuty into one dashboard so on-call engineers stop switching between tools.
Set up a workflow that automatically posts to Slack and opens a Jira ticket when a high-severity alert fires.
Deduplicate and correlate multiple related alerts about the same outage into a single incident view to reduce noise.
Use AI integration to automatically summarize an active incident and gather context from linked monitoring tools.

Open on GitHub → Full breakdown on explaingit →