Description

The Data ML Engineer, Infrastructure will support our Technology team and BakerML to create new value for clients, our firm, and our communities around the globe. To be responsible for expanding and optimizing our data and data pipeline architecture, as well as optimizing data flow and collection for cross functional teams.

The Data Engineer will support our software developers, database architects, data analysts and data scientists on data initiatives and will ensure optimal data delivery architecture is consistent throughout ongoing projects. This person will perform deployment, operations and maintenance activities for all systems around the firm's data initiatives. This person will enforce standards, policies, and procedures related to the operation of sharing firm data with systems, data tools, and users of the firm's data.

This person will be working with both technical and business teams to deliver data solutions and will collaborate with data scientists, solution architects, and business stakeholders to design, build, refine, deploy, and operate production ML pipelines and solutions. This person will unify ML solution development and ML solution deployment by bringing together the disciplines of DevOps, Data Engineering, Machine Learning Engineering, and ML Research to standardize and streamline the continuous delivery of high-performing models in production for business use.


Responsibilities:
 

  • Create and maintain optimal data pipeline architecture
  • Assemble large, complex data sets that meet functional / non-functional business requirements
  • Develop data/dev/ml pipelines based on in-depth knowledge of cloud platforms (Azure, AWS, GCP), MLOps lifecycle management (build, deploy, production support), and business requirements to ensure solutions are delivered efficiently, reliably, and sustainably
  • Identify, design, and implement internal process improvements (automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc)
  • Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and Azure ‘big data’ technologies
  • Create and deliver automated deployments and operations of ML-based solutions
  • Build analytics tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency and other key business performance metrics
  • Build prototypes that demonstrate an understanding of business and user requirements as well as knowledge of AI/ML tool
  • Translate prototypes into production-grade code with appropriate guardrails 
  • Work with stakeholders including the Executive, Data, Analytics and Service Design teams to assist with data-related technical issues and support their data infrastructure needs
  • Responsible for keeping data separated and secure across national boundaries through multiple data centers and Azure tenant
  • Create data tools for analytics and data scientist team members that assist them in building and optimizing our product into an innovative industry leader
  • Work with data and analytics experts to strive for greater functionality in our data systems
  • Deploy sophisticated analytics programs, machine learning models, and statistical methods
  • Create and deliver automated deployments and operations of ML-based solutions
  • Implement data flows for analytics and business intelligence (BI) systems
  • Identify and use the appropriate analytical techniques and keeping up to date with advances in digital analytic tools, data manipulation techniques, and products
  • Demonstrate how to expose data from systems (for example, through APIs), link data from multiple systems and deliver streaming services
  • Work with stakeholders including data, design, product and business teams and assisting them with data-related technical issues
  • Build analytical tools to utilize the data pipeline, providing actionable insight into key business performance metrics including operational efficiency, customer trends and habits
  • Train prediction models, including models that are forward-looking and not production critical


Skills and Experience:
 

  • Degree in Computer Science, Statistics, Informatics, Information Systems or another quantitative field
  • Some experience in a Data Engineer AI/ML role and with building and optimizing data sets, data pipelines and architectures
  • Experience with big data tools: Hadoop, Spark, Kafka, etc.| with data pipeline and workflow management tools | with building and optimizing ‘big data’ data pipelines, architectures and data sets
  • Advanced proficiency with both structured and unstructured data (SQL, NOSQL), with expertise writing SQL queries and query performance tuning
  • Extensive knowledge of the ML lifecycle and concepts
  • Experience building and deploying ML-based solutions and proficiency in common machine learning frameworks such as TensorFlow, Keras, scikit-learn, Pytorch, and ONNX
  • Experience deploying ML Solutions on Azure, AWS or GCP cloud platforms and exposure to cloud services necessary to build end-to-end pipeline
  • Hands-on experience with web APIs, CI/CD for ML, and Serverless Deployment
  • Knowledge of version control software, such as TFS, Git, or SVN
  • Experience with stream-processing systems
  • Experience with object-oriented/object function scripting languages: Python, Java, Shell, C#, R, etc.
  • Experience performing root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement
  • Experience supporting and working with cross-functional teams in a dynamic environment
  • Demonstrated understanding of machine learning concepts
  • Strong analytic skills related to working with unstructured datasets.
  • Build processes supporting data transformation, data structures, metadata, dependency and workload management
  • A successful history of manipulating, processing and extracting value from large disconnected datasets
  • Working knowledge of message queuing, stream processing, and highly scalable ‘big data’ data stores
  • Ability to define, maintain and improve high code standards by leading others and putting the right process in place
  • Ability to build processes that support data transformation, workload management, data structures, dependency and metadat
  • Can demonstrate initiative, ownership, and self-motivation
  • Excellent people skills
  • Being a team player is a must
  • Effective written and oral communication skills
  • Strong project management and organizational skills