Data Engineer Agent
Using Strands Agents and Tools, AWS Services (s3, Glue, Athena and Bedrock), Apache Iceberg, Model Context Protocol with FastMCP to process small data using AWS SDK for Pandas and Athena.
Ask AI about Data Engineer Agent
Powered by Claude · Grounded in docs
I know everything about Data Engineer Agent. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
[WIP] Data Engineer Agent
Using Strands Agents and Tools, AWS Services (s3, Glue, Athena and Bedrock), Apache Iceberg, Model Context Protocol with FastMCP to process small data using AWS SDK for Pandas and Athena.
This agent mainly helps out in the Sandbox Environment by figuring out schemas, query and loading data, and setting up Iceberg tables.
Architecture Design

Requirements
🚀 Powered by uv.
🐍 Everything that you need you will find on pyproject.toml.
uv sync
Configuration file
This app is using DynaConf, create a file named settings.toml with all env vars.
Bellow an example of settings file.
[development]
MCP_SERVER_HOST="127.0.0.1"
MCP_SERVER_PORT=8000
AGENT_LANGUAGE="Portuguese"
AGENT_SHOW_REASONING="true"
BYPASS_TOOL_CONSENT="true"
AWS_BEDROCK_MODEL_ID="<bedrock model id>"
AWS_GLUE_CATALOG_DATABASE="<glue database>"
AWS_DATA_BUCKET="<bucket used to upload files and for table location>"
AWS_DEFAULT_REGION="us-east-1"
AWS_ACCESS_KEY_ID="<Access Key>"
AWS_SECRET_ACCESS_KEY="<Secret Key>"
AWS_SESSION_TOKEN="<Token>"
AWS_PROFILE="<profile name>"
BYPASS_TOOL_CONSENT = Used by strands to by pass verification of tool usage.
AWS_PROFILE is optional, if provided, other all credential variables, such as AWS_ACCESS_KEY_ID, will be ignored.
Amazon Bedrock
Access to Amazon Bedrock foundation models, with the exception of OpenAI gpt-oss-120b and gpt-oss-20b models, isn't granted by default.
we You can request access, or modify access, to foundation models only by using the Amazon Bedrock console. Read more.
Running
Data processing are being done by a MCP Server using FAST MCP, yes, I know, I could add tools directly on my Strands Agent.
uv run -m dea.server
Finally, let's run the agent.
uv run -m dea.agent
Future implementations
- Glue Data Quality Creation, Execution and Show Results.
- Knowledge base with columns information.
- Change schema table and column comments.
- Enable and run Glue Iceberg Optimizations.
- New read data formats. (XLSX, Parquet etc)
