WORK HISTORY
2020
Lead Senior Data Architect, Professional Services
Amazon Web Services, Toronto
Served as technical lead for customer, partner, and AWS teams when delivering projects for AWS customers. Designed data architectures for the cloud migration of data workflows, data lakes, and Hadoop installations, as well as net-new data infrastructure in AWS. Worked hands-on with teams in solving data engineering problems and improving their cloud and data skills. Additionally worked with diverse customer teams to solve networking, security, and identity management issues, often bringing senior leaders to the table to ensure they were on-board with all changes being made.
As internal Team Lead served as formal mentor and coach for up to 3 data architects, and informally to the whole team. Acted as sounding board for architecture questions as well as approaches to consulting and communication challenges.
Contributed to the Amazon hiring process by completing 5 interviews/phone screens per month, helping to grow the Canadian team by 50% in my first 18 months.
Technology used: Glue, Athena, EMR, MSK, Lambda, Lake Formation, S3, IAM, CloudFormation, Spark, Python, NiFi, Active Directory
2017
Senior Tech Lead, Data
RBC Investor and Treasury Services, Toronto
Contributing to the architecture, design, and some development towards ingesting data from several large legacy systems into the Data Lake, transforming with Spark, NiFi, and pure Java, and exposing through APIs and a web GUI built on microservices.
Established a ‘Big Data Platform’ as guiding architecture for data applications, and introduced Kafka with Avro to the program, establishing near real-time flows as a key component of the Platform.
Involved in a PoC and eventual integration of Palantir Foundry.
Technology used: Spark, NiFi, Kafka, Yarn, Hive, HDFS, krb5, Palantir Foundry, HDP 2.4/2.6, Confluent 4/5, RedHat7
2016
Hadoop & Big Data Consultant
T4G, Toronto
Telus
Converting a batch-based system for TV customer experience modelling to near real-time. Hive over HDFS for static data, NiFi, Kafka, and Spark Streaming for near real-time, SparkML for customer modelling. The work was presented at Dataworks Summit 2017 in San Jose, with slides and audio available: Bringing Real Time to the Enterprise with Hortonworks DataFlow. Key parts are also available as a blog entry: Spark 2.0 streaming from SSL Kafka with HDP 2.4.
Technology used: Spark, NiFi, Kafka, Yarn, Hive, HDFS, krb5, HDF 2.0, HDP 2.2/2.4, RedHat 6/7
Financial Services Client
HBase optimization for time series data, Jupyter notebook setup, Spark (PySpark) and HBase for processing, Hive over HDFS for static data. HDP 2.3
2016
Hadoop & Big Data Consultant
EyeReturn Marketing, Inc., Toronto
Consulting on cluster setup and tuning; Hadoop (HDFS, Pig, YARN), HBase, and Spark. CDH 5.5
2010
Senior Software Developer & Scrum Master
GBIF, Copenhagen
Worked as a developer and the Scrum Master for a team of 4-8 developers to support the transition of gbif.org from a batch-oriented, MySQL based system to a real-time processing system based on Hadoop. The system went live in 2013 and has been running on two in-house Cloudera (CDH) clusters. I was responsible for cluster installation, upgrades, maintenance, performance tuning, and was part of the team that designed the architecture of the overall system. I wrote the portion of the system that speaks directly to HBase, including architecting the key and column structure, region sizes, etc. Contributed to all aspects of the Hadoop development, including Hive UDFs, Sqoop-ing in and out of the cluster, Ooozie workflows, custom MapReduce jobs in both versions 1 and 2 (yarn), as well as interactions with Zookeeper and Solr. Served as the primary DevOps liaison between System Administrators and the development team.
Additionally worked to build RESTful, JSON-based webservices in Java to deliver data from both CDH and traditional SQL databases (MySQL, PostgreSQL, PostGIS). These were managed in a continuous build environment using Jenkins, Maven, and Nexus. Helped build the analytics portion of the site in a combination of R and Hadoop.
Technology used: Hadoop (HDFS, Hive, Zookeeper, HBase, Mapreduce, Yarn, Sqoop, Oozie, SolrCloud), CDH 3/4/5, Java 6/7, R, RabbitMQ, Maven, Jenkins, Nexus, Varnish, Puppet, Ansible, Ganglia, Elasticsearch, Kibana, Git, IntelliJ, JIRA
2008
Software Architect & Team Lead
Zerofootprint, Toronto
Responsible for designing and developing services, messaging infrastructure, web and api clients, within an SOA, for a suite of enterprise environmental products. Notable among them are the Velo enterprise carbon management package, and the TalkingPlug energy management hardware devices and software.
Technology used: Java 5/6, SOA, ESB, Mule, JMS, ActiveMQ, Hadoop/HBase, SaaS, Spring MVC, Hibernate, Maven, Eclipse, Web Services, Ruby, JRuby, Rails
2007
Technical Lead & Senior Developer
TSOT, Toronto
Maintained and augmented a private social networking application in the fraternity, sorority and university market, written in Ruby on Rails. Designed and implemented a staged release process around a Subversion server. Technical Lead for a team of 5 comprising front-end, back-end and QA. Prioritized and organized work across the team to meet business goals.
Technology used: Ruby on Rails (1.2 & 2), MySQL, Mongrel, Subversion, Apache, TextMate, OS X, Linux
2006
Senior Java Developer
Penson Financial Services, Toronto
Built a Trade Order Management System for a fixed income trading platform. Built as a service in an SOA environment with connections to Bloomberg, both incoming trades via BTS, and outgoing trades via their Consolidated Message Feed (CMF).
Technology used: Java 5, J2EE, SOA, ESB, Mule, Websphere MQ/IBM MQSeries, JBoss, Spring, Hibernate, Maven, Eclipse, Subversion, Drools, XML, REST