CSDS

SHORT COURSES

Title: Data Science e Big Data com R
A quantidade de dados gerados em diferentes esferas e que circulam na Internet vem crescendo exponencialmente, tornando difícil a empresas e instituições armazenar toda essa informação e fazer uma análise dela. Assim surgem duas novas áreas da computação: Data Science (Ciência dos Dados) e Big Data (Grandes Volumes de Dados). Data Science, cuja finalidade é analisar quantidades massivas de dados, permite um maior conhecimento dos mesmos para possíveis tomadas de decisões, seja a partir de uma análise preditiva do que pode acontecer, ou simplesmente como resposta objetiva a algum problema dado. Sua aplicabilidade abrange vários campos, desde a indústria até a educação. Os métodos e técnicas que esta nova ciência usa emanam das áreas de ciência da computação e estatística. No entanto, Big data aparece como um termo para nombrar qualquer coleção de conjunto de dados tão grande ou complexo que resultem difícil de processar usando os métodos tradicionais de Data Science. Assim estas áreas estão fortemente interligadas por ambas terem o mesmo ponto de partida, facilitar o trabalho com os dados.
O R é um ambiente projetado desde o início para lidar com a Ciência dos Dados. Possui uma ampla variedade de técnicas estatísticas e recursos gráficos. Neste curso descrevemos simplificadamente as principais ferramentas que a linguagem R oferece para o trabalho com Data Science e Big Data.
Objetivos:

Análise Exploratória usando o pacote tidyverse
Estudo das principais funções do pacote sparklyr

Anderson Luiz Ara Souza (DE-UFBA).

Title: Introduction to Bayesian Classifiers
Bayesian classifiers, also known as probabilistic classifiers or Bayesian network classifiers, are a class of probabilistic algorithms which apply Bayes' theorem in a classic or a Bayesian point of view in order to learn the underlying probability distribution of the data. They are frequently used as supervised algorithms in statistical machine learning at several real applications. In this course we will present theory and the applications of algorithms naïve Bayes (NB), tree augmented naive Bayes (TAN), k dependence Bayesian network (KDB) and Averaged one-dependence estimator (AOE).

Julio Trecenti (Sócio da curso-R e Presidente do CONRE-3).

Title: Deep learning com o R
Neste workshop vamos discutir: i) o que são redes neurais profundas e como elas funcionam; ii) quais são os softwares utilizados para treinar esses modelos e como eles se relacionam; iii) como treinar modelos de deep learning para alguns problemas de predição. O objetivo deste workshop é se familiarizar com as principais técnicas utilizadas em Deep Learning e conhecer o suficiente para poder se aprofundar no assunto posteriormente. No final do workshop, o aluno poderá aplicar seus conhecimentos em problemas simples de classificação de textos, imagens e outros problemas de objetivo preditivo.

Crysttian Arantes Paixão (Federal University of Santa Catarina, Brazil).

Title: Regular expressions for database manipulation
The basic concepts about the use of regular expressions for database manipulation will be presented. The course will approach the main applications for data processing for analysis using R software, emphasize the collection of information on the internet (web scraping).

Jorge Guerra Pires (Federal University of Bahia, Brazil).

Title: Data Science and Biomathematics: an introduction to mathematical modeling applied to biological systems with Matlab.
Mathematical models applied to biological systems have been exploited extensively lately in medical and biological sciences. Furthermore, mathematical models can be built from "scratch" (i.e., bottom-up approaches, also known as "white box") or from "data" (i.e., top-down approaches, also known as "black box"). Mathematical models can be used to making sense of data, similar to machine learning-like approach are employed, with the additional advantage of having a better understanding of the system mathematically. Top-down approaches in mathematical models applied to biological systems are represented by tools such as neural networks, whereas the bottom-up approaches are represented by tools such as differential equations.
On this short-course you shall learn the basics of mathematical models applied to biological systems, how it can be used to interpret/exploit experimental data, as a possible alternative to well-known methodologies for data science.
Pre-requisites: it might be interesting if the audience has basic programming skills;
You may want to see https://www.youtube.com/watch?v=2lTp6zLMgQI for Matlab and https://www.youtube.com/watch?v=1kGvq8vABvw for simbiology. Symbiology is a software inside Matlab, similar to Simulink. Please, send me a priori feedback, if possible: https://docs.google.com/forms/d/1m6YeyXpX_hgKyca5AS8Qq-pnG3BRLlSQCOZaktVbv5A/viewform?edit_requested=true
It is also part of the course the activities:

Stochastic models in medicine and life sciences: a short-term dynamics for ghrelin; oral presentation, a case-study of modeling a biological system;
Data Science and Mathematical Modeling applied to biological systems: how mathematical models can be advantageous for data scientists and statisticians; round table, for discussing key issues around mathematical modeling and data science

First Conference on Statistics and Data Science

Salvador, 12-14 November, 2018

SHORT COURSES

SUPPORT: