Data Engineer
Job ID: 75605
Posted today
Exton, Pennsylvania
65 - 85/hr
Exton, Pennsylvania
Contract
65 - 85/hr
Remote
Job Details
DATA ENGINEER- PHILADELPHIA, PA (REMOTE)
The Select Group is hiring a Data Engineer to support the design, implementation, and operation of a large-scale AWS-hosted data infrastructure platform supporting critical building automation and industrial telemetry systems across approximately 1,700 sites. This engineer will develop and maintain ingestion pipelines for BMS, SCADA, HVAC controllers, environmental sensors, and utility metering systems into a centralized AWS data lake leveraging Amazon S3, Lake Formation, and Glue Catalog. The role will also focus on configuring AWS IoT SiteWise asset models, managing time-series data within Amazon Timestream, and ensuring the scalability, reliability, and performance of the overall platform. Working closely with cross-functional engineering teams and AWS architects, this position plays a foundational role in supporting enterprise-wide data operations and infrastructure strategy.
What You'll Bring:
-
4+ years in data engineering with production pipeline experience
-
Strong Python and SQL; experience with Spark or Pandas for large-scale data processing
-
Hands-on experience with managed time-series storage: Amazon Timestream, InfluxDB, TimescaleDB, or Historian systems
-
AWS data lake experience: S3 + Glue + Lake Formation + Athena (or willingness to adopt; Azure Data Lake or Databricks experience translates)
-
AWS IoT service exposure: IoT Core, IoT SiteWise, IoT Greengrass (or strong willingness to learn)
-
API and protocol integration experience: REST, MQTT, BACnet, Modbus, OPC-UA
-
Data quality monitoring and alerting — not just pipeline building
-
Git and CI/CD familiarity for pipeline code management; infrastructure-as-code experience (Terraform or AWS CDK) preferred
Bonus Experience:
-
Experience with BMS, SCADA, or industrial IoT data sources
-
Apache Kafka or Amazon Kinesis Data Streams experience
-
dbt (data build tool) or equivalent transformation framework
-
Amazon Managed Grafana or similar observability tooling
-
Experience in critical facility, utility, telecom, or energy sector environments
-
Familiarity with NIST SP 800-82 (OT security) or NERC CIP
What You'll Do:
-
Design and build data ingestion pipelines from BMS, SCADA, HVAC controllers, environmental sensors, and utility metering systems into AWS IoT SiteWise and the shared data lake
-
Configure AWS IoT SiteWise asset models that mirror the physical facility hierarchy (site ? building ? system ? component ? measurement)
-
Architect and maintain the shared data lake schema on S3 + Lake Formation + Glue Catalog; coordinate schema changes across the joint team
-
Configure Amazon Timestream for hot time-series storage with appropriate retention tiers (memory store and magnetic store)
-
Implement data quality frameworks: completeness checks, anomaly detection, cadence validation, alerting via CloudWatch and SNS
-
Build and document AWS IoT Greengrass v2 component integrations with source systems via BACnet (IP and MS/TP), Modbus (TCP and RTU), OPC-UA, SNMP, REST APIs, and MQTT (with mTLS authentication via X.509 device certificates)
-
Optimize query performance and partitioning strategies for time-series facility data in Timestream and Athena (over S3)
-
Support the Simulation Engineer's data feed requirements and HVAC automation event logging for the VBAS POC
-
Operate the production data lake at fleet scale, monitor pipeline health, manage schema governance
-
Produce data flow diagrams, schema documentation, and data dictionary for handoff and post-handoff operations
TSG is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran. #LI-BF1