Hi, Hope you are doing Great. Kindly go through the below requirement. If you feel comfortable then please revert me with your updated resume. Position: Datalake/Bigdata AWS Resource Location : Boston, MA Duration: 6-12+ Months Interview Mode: Phone and Skype JOB DESCRIPTION : Datalake/Hadoop Resource :- - Implementation and Administration of On-prem Data lake environment
- Monitoring and managing the Hadoop services on 3 clusters
- Installing the New hosts (Head nodes, compute nodes and worker nodes to the existing cluster) and decommission of the hosts from the cluster
- Maintenance and Monitoring of the jobs of Production, UAT and Development environments
- Code changes and updated code deployments in the UAT and Production environments
- Deploying code changes on Rshiny server and Rstudio server as per the user request
- Implementation and Monitoring of oozie scheduled jobs
- Implementation of patching activities and applying the fixes to the data lake environment provided by the Hortonworks
- Working on the job failures mostly Hive and Spark jobs across the data lake environment
- Onboarding the new users to the Hadoop data lake environment
- Requirements gathering for creating the databases in Hive and providing policy based access management from the Ranger for the new Proof of Concepts (POCs) like Veeva Insights
- Supporting the developers for executing the adhoc jobs in Hive environments for the existing POCs like Enrollment_forecaster etc
- HDFS home directories and Hive schema, table and column level enforcing access bases policies management from Ranger
- Implementation of Security and management of Active Directory based Kerberos authentication across data lake clusters
- Implementation of SSL for the Ambari and other HDP services in Hortonworks environment across the data lake clusters
- Management of Encryption and Decryption of the users data using Ranger-KMS across the clusters of data lake environment
- Installation and upgradation of Jupyterhub and python packages to support the developers for implementing the code in on-prem environments
- working with HPC team for hardware issues and allocation of physical resources for the data lake environment
- Hail- Spark implementation and analysis of UKBIOBANK datasets of genotypes and Phenotypes
- Installation of latest version of spark and hail and optimization of Resources for launching datasets with huge size of data
- Work with Hortonworks team for the planned upgradation of HDP version from 2.6 to 3.0
- Support and maintenance of MongoDB servers in data lake
- Source code Repository maintenance in Bitbucket
In addition to the above tasks, the resource will also perform the following AWS activities - Support of Cloudbreak server in AWS for the Hortonworks CB Deployment.
- Support of software upgrades for Cloudbreak, HDP packages installation in AWS Cluster.
- Support Data scientists for any technical issues during the execution of Spark-Hail jobs in Cloudbreak AWS cluster.
- Setup of latest versions of Spark and Hail in AWS spark cluster.
| |
|
To unsubscribe from future emails or to update your email preferences click here .