PySpark Developer // Remote

Hello,

Please find the below requirement details

Position: PySpark Developer

Location: Columbus, OH - Remote / Nearby OH required

Duration: 6 Months

Required Skills

PySpark, Hadoop, DataStage/SSIS, DB2

Job Description

SUMMARY:

5+ years of experience in handling Data Warehousing and Business Intelligence projects in Banking, Finance, Credit card and Insurance industry.

Design and Developed real time streaming pipelines for sourcing data from IOT devices, defining strategy for data lakes, data flow, retention, aggregation, summarization for optimizing the performance of analytics products.

Extensive experience on Data analytics

Good knowledge on Hadoop Architecture and its ecosystem.

Having extensive knowledge on Hadoop technology experience in Storage, writing Queries, processing and analysis of data.

Experience on migrating on Premises ETL process to Cloud.

Work on various Hadoop file formats

Experience in Data Warehousing applications, responsible for the Extraction, Transformation and Loading (ETL) of data from multiple sources into Data Warehouse

Experience in optimizing Hive SQL queries, Datastage and Spark Jobs.

Implemented various frameworks like Data Quality Analysis, Data Governance, Data Trending, Data Validation and Data Profiling with the help of technologies like Spark, Python and DB2

Experience with creation of Technical document for Functional Requirement, Impact Analysis, Technical Design documents, Data Flow Diagram with MS Visio.

Experience in delivering the highly complex project with Agile and Scrum methodology.

Quick learner and up-to-date with industry trends, Excellent written and oral communications, analytical and problem-solving skills and good team player, Ability to work independently and well-organized.

PROFESSIONAL EXPERIENCE:

Design and develop ETL integration patterns using Python on Spark.

Develop framework for converting existing Datastage mappings and to PySpark (Python and Spark) Jobs.

Create PySpark frame to bring data from DB2

Translate business requirements into maintainable software components and understand impact (Technical and Business)

Provide guidance to development team working on PySpark as ETL platform

Optimize the PySpark jobs to run on Kubernetes Cluster for faster data processing

Provide workload estimates to client

Migrate On prem ETL process to AWS cloud and Snowflakes

Implement CICD(Continuous Integration and Continuous Development) pipeline for Code Deployment

Reviews components developed by the team members

Thanks & Regards,

Saranya

Technical Recruiter

A picture containing text Description automatically generated

Phone: (302) 204-0565

Email: saranya@imcsgroup.net

9901 East Valley Ranch Parkway

Suite 3020 Irving, Texas – 75063

Disclaimer

This electronic mail (including any attachments) may contain information that is privileged, confidential, and/or otherwise protected from disclosure to anyone other than its intended recipient(s). Any dissemination or use of this electronic mail or its contents (including any attachments) by persons other than the intended recipient(s) is strictly prohibited. If you have received this message in error, please notify us immediately by reply e-mail or e-mail unsubscribe@imcsgroup.net so that we may correct our internal records. Please then delete the original message (including any attachments) in its entirety. Thank you

Giga Giglet – Tech Jobs & Career Updates in the U.S.

Search This Blog

PySpark Developer // Remote

No comments:

Post a Comment

Featured Post

FW: Top Priority. - Embedded Software Engineer at Peoria IL

Contact Form

Total Pageviews