📊

Data Engineer Agent

Using Strands Agents and Tools, AWS Services (s3, Glue, Athena and Bedrock), Apache Iceberg, Model Context Protocol with FastMCP to process small data using AWS SDK for Pandas and Athena.

0 installs

Trust: 34 — Low

Data

Ask AI about Data Engineer Agent

I know everything about Data Engineer Agent. Ask me about installation, configuration, usage, or troubleshooting.

0/500

Loading tools...

Reviews

Documentation

[WIP] Data Engineer Agent

Using Strands Agents and Tools, AWS Services (s3, Glue, Athena and Bedrock), Apache Iceberg, Model Context Protocol with FastMCP to process small data using AWS SDK for Pandas and Athena.

This agent mainly helps out in the Sandbox Environment by figuring out schemas, query and loading data, and setting up Iceberg tables.

Architecture Design

design

Requirements

🚀 Powered by uv.

🐍 Everything that you need you will find on pyproject.toml.

uv sync

Configuration file

This app is using DynaConf, create a file named settings.toml with all env vars.

Bellow an example of settings file.

[development]
MCP_SERVER_HOST="127.0.0.1"
MCP_SERVER_PORT=8000
AGENT_LANGUAGE="Portuguese"
AGENT_SHOW_REASONING="true"
BYPASS_TOOL_CONSENT="true"
AWS_BEDROCK_MODEL_ID="<bedrock model id>"
AWS_GLUE_CATALOG_DATABASE="<glue database>"
AWS_DATA_BUCKET="<bucket used to upload files and for table location>"
AWS_DEFAULT_REGION="us-east-1"
AWS_ACCESS_KEY_ID="<Access Key>"
AWS_SECRET_ACCESS_KEY="<Secret Key>"
AWS_SESSION_TOKEN="<Token>"
AWS_PROFILE="<profile name>"

BYPASS_TOOL_CONSENT = Used by strands to by pass verification of tool usage.

AWS_PROFILE is optional, if provided, other all credential variables, such as AWS_ACCESS_KEY_ID, will be ignored.

Amazon Bedrock

Access to Amazon Bedrock foundation models, with the exception of OpenAI gpt-oss-120b and gpt-oss-20b models, isn't granted by default.

we You can request access, or modify access, to foundation models only by using the Amazon Bedrock console. Read more.

Running

Data processing are being done by a MCP Server using FAST MCP, yes, I know, I could add tools directly on my Strands Agent.

uv run -m dea.server

Finally, let's run the agent.

uv run -m dea.agent

Future implementations

Glue Data Quality Creation, Execution and Show Results.
Knowledge base with columns information.
Change schema table and column comments.
Enable and run Glue Iceberg Optimizations.
New read data formats. (XLSX, Parquet etc)