Data Engineer with Spark and Scala

Skill: Spark, Scala

Skills 

experience in Data pipeline engineering for both batch and streaming applications.   

 Kafka Streaming, Security: Anytime connect via Kafka know about security authentication, SSL Knowledge. Confluent Kafka

Experience with data ingestion process, creating data pipelines and performance tuning with Snowflake and AWS.  

Knowledge aspect of AWS S3(Storage, EC2 ; like knowledge of AWS-101), how to connect AWS(encryption knowledge)

Implementing SQL query tuning, cache optimization, and parallel execution techniques. Must be hands-on coding capable in at least a core language skill of (Python, Java or Scala) with Spark. 

Scala: Should be good in troubleshooting, Knows ETL/ELT.  Distributed processing using spark, how to optimize the processing, performance tuning (knows Pyspark). Knowledge of Docker: what’s docker, how to build, test and deploy 

Expertise in working with distributed DW and Cloud services (like Snowflake, Redshift, AWS etc) via scripted pipeline Leveraged frameworks and orchestration like Airflow as required for ETL pipeline This role intersects with “Big data” stack to enable varied analytics, ML etc. Not just Datawarehouse type workload.   

Handling large and complex sets of XML, JSON, Parquet and CSV form various sources and databases.  Solid grasp of database engineering and design Identify bottlenecks and bugs in the system and develop scalable solutions Unit Test and document deliverables Capacity to successfully manage a pipeline of duties with minimal supervision. 

Nice to have: Stream sets, Databricks, Airflow orchestration,

DevOps: Working knowledge on DevOps environment. Basic idea how to move data in snowflake from Delta lake(Databricks)/Onprem data lake, Some knowledge of Kerberos.

Comments

Popular Posts