Skip to main content
Data Engineering

Data Engineering

This Specialist-level certification will benefit practicing or aspiring data engineers, data scientists, data stewards, or anyone responsible for managing or processing data sets.

Develop Essential Data Engineering Skills from Programming Fundamentals to Data Security and IoT Data

Data engineers design, manage, analyze, and optimize data within an organization. It has become one of the most important and in-demand jobs for global organizations across a wide range of industries. Data Engineers need to possess advanced programming skills and an understanding of data management systems and data pipelines. Our courses will challenge you in all areas, from data analytics to data preparation for end-users.

Course Modules

Courses are offered as individual modues to accomodate different learning requirements. Each module will prepare the learner for a key portion of the Dell Technologies Proven Professional data engineering specialist-level certification exam. On Demand labs covering each module are also offered to reinforce the course curriculum. 

Introduction to Data Engineering - Free Course
For over 10 years, there has been an intense focus by companies to extract business value from their data. Out of this activity, a role called the data scientist emerged. However, it quickly became obvious that a majority of a data scientist’s time was spent on data preparation or moving analytical models into production environments. Thus, the data engineer has emerged as a highly desirable and indispensable member of an analytics project team. The course covers the data analytics life cycle, the role of the data engineering, and the skill set of a successful data engineer.
Processing Streaming and IoT Data
Due to the proliferation of smart devices and the growing need for real-time analysis, the legacy batch processing approaches are insufficient to support modern applications such as credit card fraud detection, cybersecurity protection, and automobile navigation. This course introduces such use cases and the approaches to process streaming data generated by the Internet of Things (IoT). To process and analyze IoT data, this course covers several Apache tools: Storm, Kafka, Spark Streaming, and Flink. Additionally, Pravega, a new storage paradigm, is covered as well as emerging IoT projects, Project Nautilus and EdgeX Foundry. The on-demand course includes recorded lab exercise demonstrations of the Apache tools.
Data Warehousing with SQL and NoSQL
With the rapid growth in the volume of data, organizations continue to be challenged with how to transact and later analyze this data in a timely manner. This course covers traditional data warehousing with SQL and newer approaches such as data lakes and NoSQL tools. Provide a mixture of theory and practical considerations, the on-demand course includes recorded lab exercise demonstrations of the Greenplum Database, Redis, Apache Cassandra, and Apache CouchDB.
Building Data Pipelines with Python
A key responsibility of a data engineer is to automate the movement and transformation of data from source systems to data repositories such as data lakes. This course covers the fundamentals of Python programming and how to construct data pipelines. This on-demand course includes recorded lab exercise demonstrations of basic Python programming techniques and how Python is used to process and transform data.
Data Governance, Security and Privacy for Big Data
This course introduces data governance and data management approaches applicable to a wide range of use cases. Focusing on the various governance roles and models, this course prepares the learner to implement and maintain a successful data governance program. Additionally, the course covers several Apache tools, Atlas, Ranger, and Knox, that enable the management and control of an organization’s datasets in Big Data environments such as Hadoop. The on-demand course includes recorded lab exercise demonstrations of these Apache tools.
ETL Off-load with Hadoop and Spark
As data volumes continue to grow, more and more demands are placed on data warehousing resources to not only host an organization’s data, but to also merge and prepare the data for end-user consumption. This course covers how off-loading the Extract-Transform-Load (ETL) processes from a data warehouse can improve delivery times and enable the inclusion of new sources of data. This on-demand course includes recorded lab exercise demonstrations of several Apache Hadoop ecosystem tools: Hadoop Distributed File System (HDFS), Spark, Flume, Sqoop, and Oozie.

Engage your local Dell Learning Account Manager for local pricing information and scheduling classes. Visit us online at or call +1 888 362 8764 (US).