Cutting edge training in epidemiology and related disciplines

Division: Epidemiology | Location: London, Budapest, Washington DC, Hamburg, Worldwide

Cutting edge training in

field d

Say goodbye to boring training sessions, pointless classroom time and generic advice. CBRD's experts provide training sessions that draw heavily on their experience working on some of the most challenging problems encountered in public health and epidemiology, and are heavily project-driven. Instead of classroom boredom, you will work on real-life case studies and leave with actionable insights you can implement in your practice right away.

  • Python for epidemiologists

    Over the last few decades, Python has emerged as the de facto gold standard in scientific computing in quantitative disciplines. Our Python for epidemiologists course teaches practising epidemiologists with some experience with another programming language (R, SAS, etc.) the basics of Python, as well as the fundamentals of managing datasets in pandas and quantitative techniques in numpy.


    3 days (abbreviated course), 5 days (full course), 8 days (project-based extended course)

    Provision and maximum participants

    On-site or at a specialized venue. We recommend project-based extended courses to be done on-site for security reasons. The maximum class size is 12 attendees.


    Items marked as elective can be specifically added to the 5-day and 8-day project based courses.

    • Basic Python programming: logic, recursion, functions and lambdas, basic OOP
    • Detailed object oriented programming, inheritance, prototypes and object-oriented design patterns
    • The iPython/Jupyter development environment
    • Introduction to the PyCharm IDE
    • Reading data from various sources into a pandas data frame
    • Using pandas with an SQL database
    • Using pandas with HDFS
    • Manipulating and filtering data frames
    • Time series data in pandas
    • Split-apply-combine
    • Elementary plotting with matplotlib and Seaborn
    • Plotting epi curves in matplotlib
    • Plotting Kaplan-Meier survival plots
    • Basic epidemiological analyses and plotting using epipy
    • Plotting case trees with epipy
    • Managing projects in git and integrating git with Jupyter
  • Epidemiology in R

    Owing largely to its extensive ecosystem of packages and applications, R remains one of the most popular choices for data science, statistics and epidemiology. Our modular Epidemiology in Rprogramme consists of six modules - two general modules on R and four 'deep dives' -, each lasting five weeks. The modules can be combined according to client needs and use cases.


    5 days per module.

    Provision and maximum participants

    On-site or at a specialized venue. The maximum class size is 18 attendees.

    Module 1: Beginning R

    Hit the ground running with a skill that teaches you everything you need to know to get started with R, from data frames to RStudio.

    • Getting to know R in the command line
    • RStudio: basic concepts
    • Data types: primitives, composite types, data frames and data.table objects
    • Functions and control flow
    • Loading and saving data
    • Subsetting, filtering and transforming data
    • Basic statistical functions

    Module 2: Advanced R programming

    Learn the skills you need for more demanding analytical tasks.

    • Vectorisation and *pply functions
    • Package management: installing and managing packages
    • RMarkdown, knitr and report generation in R
    • Advanced statistical functions
    • Distribution and model fitting
    • Basic plotting skills
    • Packaging analytical outputs

    Module 3: Data preparation and management

    Master of data manipulation and exploratory analysis, and learn how to interact with a range of data sources.

    • Interacting with a range of data formats: HDFS, Excel, CSV/TSV, HDF5, SQL and NoSQL databases
    • Filtering, sorting and advanced data manipulation
    • Using dplyr and the split-apply-combine paradigm for data analysis
    • Recoding datasets
    • Data preparation for statistical analyses

    Module 4: Statistics and inference testing

    Leverage R's powerful statistical algorithms for inference testing and statistical applications on large datasets.

    • The statistical basis of inference testing (t-test, CLT, confidence intervals)
    • Correlation testing
    • Using heatmaps to visualize correlations
    • Inference for high-throughput data
    • Statistical modeling using linear regression and GLMs

    Module 5: Bioconductor and biostatistics in R

    Master the techniques crucial for biostatistics and genomic data processing.

    • Introduction to Bioconductor
    • Managing genome-scale data
    • Genomic annotation using Bioconductor
    • Gene-wise t-tests and genomic comparisons
    • Using limma to perform moderated t-tests
    • Analysing DNA methylation
    • Multi-omic source integration
    • ChIP-seq and RNA-seq analysis
    • Visualizing genomic data

    Module 6: Data visualization and communication

    Learn techniques for presenting complex epidemiological and genomic data in an effective, professional manner, create publication-ready visualizations and package analyses for reproducibility.

    • Theory of visualisation and the grammar of graphics
    • Basic plotting paradigms in ggplot2
    • Creating epidemiological plots in ggplot2: epi curves, case trees, etc.
    • Genomic visualization using circos
    • Avoiding visual pitfalls and misleading visualizations
    • Exporting visualisations to publication-ready vector formats
    • Interactive visualisations using Shiny
    • Web-embeddable visualisations from R

  • Graph methods for epidemiologists

    Graph theory is almost naturally suited for the analysis and simulation of communicable diseases in a population. Our Graph methods for epidemiologists course introduces practising epidemiologists with some background in using Python or R to the basics of graph theory, modern graph databases and graph analytics using Gephi and Neo4j.


    5 days

    Provision and maximum participants

    On-site or at a specialized venue. The maximum class size is 12 attendees.

    • Graph theory and representation of populations as graphs
    • Graph representation in code
    • Constructing graphs in Neo4j and interacting with a Neo4j graph from Python or R
    • Graph representation and manipulation in Python using NetworkX
    • Multi-property graphs in Neo4j
    • Graph front-ends
    • Graph analysis with Gephi
    • Graph visualisation with Gephi
  • Computational pharmacovigilance in R and Python

    Pharmacovigilance is the art and science of detecting adverse drug reactions (ADRs) from various reporting sources and analysing risks to assist decision-making on risk management. This course on computational pharmacovigilance assists epidemiologists and statisticians working in the pharmacovigilance sector to exploit passive and active reporting systems, detect safety signals and identify groups at risk.


    5 days

    Provision and maximum participants

    On-site or at a specialized venue. The maximum class size is 12 attendees. Attendees are required to provide proof of employment by a regulatory agency, pharmaceutical vendor/manufacturer or other professional organisation in the pharmacovigilance field.

    • Pharmacovigilance: legal obligations and ethical guidelines
    • Passive vs active reporting
    • Disproportionality measures: PRR, RRR, chi-square with Yates' correction, ROR
    • Bayesian methods and DuMouchel's EBGM method for signal generation
    • Setting signal thresholds
    • Implementing EBGM in R using the ebgm package and visualizing results
    • Filtering and normalizing entities from various databases, e.g. VAERS, to make suitable for EBGM analysis
    • Text mining using nltk and deep learning models in keras to leverage unstructured data
  • Synthetic populations in R

    A synthetic population is an anonymized data set generated on the basis of a source data set, such as census data or medical records, that exhibits the same statistical properties as the source data set in all material respects. Synthetic populations are useful for agent-based modeling and epidemiological simulations where ethical and data protection guidelines would preclude using real data.


    3 days

    Provision and maximum participants

    On-site or at a specialized venue. The maximum class size is 8 attendees. Attendees should have a sound proficiency in R.


    • Synthetic population theory
    • Legal and data protection aspects of synthetic populations
    • Using synthpop in R to generate synthetic populations
    • Attaching a synthetic population to an agent-based model
    • Case study: modeling transmission of STDs in a synthetic population
  • Building and operating field deployable data analysis capabilities

    Operating a data management and collection facility in the field is a very different challenge from lab-based data work. This course, taught by experts in data science, operations and networking, will teach you how to prepare for field deployment, select the right kind of hardware and software, establish reliable communication links and operate a field data service. Recommended for operations managers in public health, this course is an indispensable practical guide to making your first field operation a success.


    5 days

    Provision and maximum participants

    On-site or at a specialized venue. The maximum class size is 12 attendees.


    • Challenges of operating a data service in the field: an overview
    • Managing transportation requirements
    • Power and communication links
    • Selecting a suitable data collection architecture
    • Ad hoc networking and data collection.
    • Distributing computing tasks between the home base and the field facility.
    • Safeguarding data and providing adequate information on data protection.

Coming soon: online training courses

Demand for our in-person training courses is quite high, and our trainers are often booked months in advance. To allow as many as possible to take the benefit of CBRD's expertise through our training courses, we have developed EpiTrain, a rich multimedia-based way to deliver our courses. EpiTrain courses are typically smaller, and much more affordable, with course fees beginning at £100 per course! EpiTrain is designed to provide students with the same rigorous, practical education as classroom courses, but on their own schedule. Certificates will be identical to in-person training courses, and re equally widely accepted as credentials of outstanding practical computational epidemiology skills training.

September 2018
  • Data manipulation in R
  • Visualizing data in Python using matplotlib
  • Working with genomic data in Bioconductor
  • A deep dive in epipy
  • Neo4j: basics for the practising epidemiologist
December 2018
  • Network analysis in Python using NetworkX
  • ggplot2 for publication-ready plots in epidemiology
  • Pharmacovigilance disproportionality metrics in Python
  • Visualizing genomic data using circos
  • Neo4j: basics for the practising epidemiologist