Founded in 2000, our Client is a strategic analytics partner to the most admired Fortune 500 companies globally and helps them power every human decision in the enterprise by bringing analytics and AI to the decision. They have 1000+ member team across 12 global locations including the United States, UK and India and has been recently featured as a “Hot Artificial Intelligence (AI)” company by Forbes. They has also been recognized as “Cool Vendor in Analytics” and a “Vendor to watch” by Gartner.
Implementation including loading from disparate data sets, pre-processing using Hive and Pig.
Scope and deliver various Big Data solutions
Ability to design solutions independently based on high-level architecture.
Manage the technical communication between the team and client
Building a cloud-based platform that allows easy development of new applications
3-8 years of demonstrable experience designing technological solutions to complex data problems, developing & testing modular, reusable, efficient and scalable code to implement those solutions.
Ideally, this would include work on the following technologies:
Expert-level proficiency in Python (preferred)/R. Scala knowledge a strong advantage.
Strong understanding and experience in distributed computing frameworks, particularly Apache Hadoop 2.0 (YARN; MR & HDFS) and associated technologies — one or more of Hive, Sqoop, Avro, Flume, Oozie, Zookeeper, etc.
Hands-on experience with Apache Spark and Pyspark
Basic data science skills such as clustering, regression and decision trees
Ability to work in a team in an agile setting, familiarity with JIRA and clear understanding of how Git works
In addition, the ideal candidate would have great problem-solving skills, and the ability & confidence to hack their way out of tight corners.
Education: B.E/B.Tech in Computer Science or related technical degree