Position: Integration Consultant / Big Data Developer
Location: Oklahoma City, OK
Duration: Long Term (Initial 4 weeks and then Long Term)
For initial 4 weeks contract expenses will be paid.
Customer is migrating all data and analytics workloads to cloud services providers (Azure and AWS) and will be using a number of integrated, complementary, and supporting technologies such as Confluent Kafka, Databricks Spark, Snowflake Cloud Data Warehouse (and Snowpipe), HVR.
Area of Focus
Hands-on execution of development and migration of Informatica and SIS pipelines leveraging Snowpipe and Databricks/Spark/Scala into Snowflake
Key Technologies that will be a focus for the team are categorized below:
1. Cloud
Azure (for continued work in 2019), AWS
2. Data Hub
Cloudera (for continued work in 2019), Impala, Snowflake (will add workloads including CDH migration), Snowpipe
3. Data Engineering
Databricks Spark and Apache Spark (CDH), Scala, Confluent Kafka, Apache Kafka (CDH)
4. Data Flow & Data Movement
Streamsets, IoT Hub
5. Data Science
Cloudera Data Science Workbench (for continued work in 2019), R (majority of data science work today), Python, Databricks, AWS SageMaker
6. Storage
HDFS (Cloudera), AWS S3, Azure Blob, ADLS
7. Security
Octa, Azure AD, Likewise
8. Visualization
Spotfire, Arcadia Data, PowerBI
Position: Data Engineer/Big Data Developer
Location: Houston, TX
Duration: Long Term
Responsibilities:
Build technical solutions required for optimal ingestion, transformation, and loading of data from a wide variety of data sources using open source, Azure, or GCP frameworks and services · Work with development teams to provide feedback surrounding data-related technical issues and provide the needed scalable data infrastructure for projects · Ability to advise on the design of technical solutions as well as assist with implementation steps · Continually improve existing solutions and develop documentation
Qualifications:
3+ years of professional experience with database and ETL development, including data processing workflows (Kafka, Mapr Streams. Spark Streaming, Apache Beam, Streamsets)
2+ years of professional experience with major GCP data platform components (BigQuery, BigTable, DataFlow, DataProc, DataLab, DataStudio, DataPrep, CMLE) or similar cloud technologies
3+ years of development and implementation of enterprise-level data solutions utilizing Java and Python (Scikit-learn, Scipy, Pandas, Numpy, Tensorflow)
1+ year deploying container based workloads preferably in Kubernetes
Proven collaboration with data scientists and data stewards to uncover detailed business requirements related to data engineering
Proficient in Linux OS command line tools and bash scripting
NoSQL experience (such as HBase, Cassandra or similar) with time series data
Experienced in Agile development methodologies
Track record of automating manual processes
Google Data Engineer certification preferred
Excellent communication and presentation skills
Reshmi Pendurthi.
No comments:
Post a Comment
Thanks
Gigagiglet
gigagiglet.blogspot.com