Cursos de Data Science

Foundations of Data Science

Foundations of Data Science: A Data Science Course for Everyone es un curso de la Universidad de Berkeley de nivel introductorio, apto para estudiantes de entrada de cualquier universidad, que ha sido específicamente diseñado para estudiantes que no han recibido previamente ningún curso sobre estadística o data science. Todos los materiales están disponibles en la web del curso, así como el libro de texto del curso, los videos de las clases y las slides del curso.

El curso hace un uso extensivo de la herramienta Jupyter Notebook que se explica en el artículo The course of the future – and the technology behind it.

Johns Hopkins’ Data Science Specialization

Data Science Specialization Launch Your Career in Data Science. A nine-course introduction to data science, developed and taught by leading professors.

Ask the right questions, manipulate data sets, and create visualizations to communicate results. This Specialization covers the concepts and tools you'll need throughout the entire data science pipeline, from asking the right kinds of questions to making inferences and publishing results.

In the final Capstone Project, you’ll apply the skills learned by building a data product using real-world data. At completion, students will have a portfolio demonstrating their mastery of the material.

Neural Networks for Machine Learning

Neural Networks for Machine Learning Learn about artificial neural networks and how they're being used for machine learning, as applied to speech and object recognition, image segmentation, modeling language and human motion, etc. We'll emphasize both the basic algorithms and the practical tricks needed to get them to work well.

This course contains the same content presented on Coursera beginning in 2013. It is not a continuation or update of the original course. It has been adapted for the new platform.

Please be advised that the course is suited for an intermediate level learner - comfortable with calculus and with experience programming (Python).

Plan de estudios "Machine Learning for Software Engineers"

Un ingeniero de software vietnamita, Nam Vu, ha diseñado un plan de estudios para convertirse en data scientist. El plan está diseñado considerando que un ingeniero de software no ha recibido tanta formación matemática como la que recibe un estudiante universitario americano de Computer Science y está muy orientado a conseguir resultados prácticos con los modelos, a conseguir lo que se pretente (predecir, clasificar, etc) y menos orientado a dar las bases teóricas que sustentan las diferentes técnicas y modelos (todo esto según la descripción del plan de estudios que hace el propio autor).

En el portal Medium hay dos artículos en los que el autor explica el plan de estudios, las motivaciones para crearlo, los objetivos que perseguía y consejos para cualquiera que quiera seguirlo. Los artículos son How I plan to become a machine learning engineer y Top-down learning path: Machine Learning for Software Engineers. El plan de estudios en sí está en el repositorio github Top-down learning path: Machine Learning for Software Engineers, que tiene 16k+ stars.

Curso de Machine Learning de Andrew Ng

Andrew Ng is the former chief scientist at Baidu and is the adjunct professor at Stanford University. But perhaps he’s more famously known for being the co-founder and chairman of Coursera.

At Coursera, Ng started one of the most famous Machine Learning MOOCs (Massively Open Online Course) known to date! While it can be quite “dumbed down”, it is still highly regarded as one of the best places to jump into machine learning. His course can be found here.

Curso de Machine Learning de la Carnegie Mellon University (prof. Tom Mitchell)

This course covers the theory and practical algorithms for machine learning from a variety of perspectives. We cover topics such as Bayesian networks, decision tree learning, Support Vector Machines, statistical learning methods, unsupervised learning and reinforcement learning. The course covers theoretical concepts such as inductive bias, the PAC learning framework, Bayesian learning methods, margin-based learning, and Occam's Razor. Short programming assignments include hands-on experiments with various learning algorithms. This course is designed to give a graduate-level student a thorough grounding in the methodologies, technologies, mathematics and algorithms currently needed by people who do research in machine learning.

The video lectures of the course are here.

Curso de Machine Learning de Udacity

Similar to Andrew Ng, Sebastian Thrun is a Stanford Professor and is also co-founder and chairman of a MOOC, Udacity. He is also the CEO of the Kitty Hawk Corporation, and founded Google X and Google’s self driving car team.

Rivaled to Coursera, Udacity is also one of the leading MOOCs and has their own machine learning course. Udacity’s machine learning course is a tad more difficult than Coursera’s. Therefore it would be wiser to start with Andrew Ng’s and then move on to this one. The course can be found here.

fast.ai : Making neural nets uncool again

fast.ai is dedicated to making the power of deep learning accessible to all. Deep learning is dramatically improving medicine, education, agriculture, transport and many other fields, with the greatest potential impact in the developing world. For its full potential to be met, the technology needs to be much easier to use, more reliable, and more intuitive than it is today.

La web tiene un blog con artículos de Rachel Thomas (co-fundadora de fast.ai) y varios recursos formativos, por ejemplo, de algebra lineal. El curso esta aqui.

CS231n: Convolutional Neural Networks for Visual Recognition

Computer Vision has become ubiquitous in our society, with applications in search, image understanding, apps, mapping, medicine, drones, and self-driving cars. Core to many of these applications are visual recognition tasks such as image classification, localization and detection. Recent developments in neural network (aka “deep learning”) approaches have greatly advanced the performance of these state-of-the-art visual recognition systems. This course is a deep dive into details of the deep learning architectures with a focus on learning end-to-end models for these tasks, particularly image classification. During the 10-week course, students will learn to implement, train and debug their own neural networks and gain a detailed understanding of cutting-edge research in computer vision. The final assignment will involve training a multi-million parameter convolutional neural network and applying it on the largest image classification dataset (ImageNet). We will focus on teaching how to set up the problem of image recognition, the learning algorithms (e.g. backpropagation), practical engineering tricks for training and fine-tuning the networks and guide the students through hands-on assignments and a final course project. Much of the background and materials of this course will be drawn from the ImageNet Challenge.

These notes accompany the course: tutorials about Python, iPython, Google Cloud, Google GPUs, neural networks, convolutional neural networks. Worth reading.

CS 20SI: Tensorflow for Deep Learning Research

Tensorflow is a powerful open-source software library for machine learning developed by researchers at Google Brain. It has many pre-built functions to ease the task of building different neural networks. Tensorflow allows distribution of computation across different computers, as well as multiple CPUs and GPUs within a single machine. TensorFlow provides a Python API, as well as a less documented C++ API. For this course, we will be using Python.

This course will cover the fundamentals and contemporary usage of the Tensorflow library for deep learning research. We aim to help students understand the graphical computational model of Tensorflow, explore the functions it has to offer, and learn how to build and structure models best suited for a deep learning project. Through the course, students will use Tensorflow to build models of different complexity, from simple linear/logistic regression to convolutional neural network and recurrent neural networks with LSTM to solve tasks such as word embeddings, translation, optical character recognition. Students will also learn best practices to structure a model and manage research experiments.

Resources: Course homepage, detailled syllabus and class notes and github repository.

42 Steps to Mastering Data Science

This post is a collection of 6 separate posts of 7 steps a piece, each for mastering and better understanding a particular data science topic, with topics ranging from data preparation, to machine learning, to SQL databases, to NoSQL and beyond.

7 Steps to Mastering Data Preparation with Python

Data preparation, cleaning, pre-processing, cleansing, wrangling. Whatever term you choose, they refer to a roughly related set of pre-modeling data activities in the machine learning, data mining, and data science communities.

7 Steps to Mastering Machine Learning With Python

This post aims to take a newcomer from minimal knowledge of machine learning in Python all the way to knowledgeable practitioner in 7 steps, all while using freely available materials and resources along the way. The prime objective of this outline is to help you wade through the numerous free options that are available; there are many, to be sure, but which are the best? Which complement one another? What is the best order in which to use selected resources?

7 More Steps to Mastering Machine Learning With Python

After a quick review -- and a few options for a fresh perspective -- this post will focus more categorically on several sets of related machine learning tasks. Since we can safely skip the foundational modules this time around -- Python basics, machine learning basics, etc. -- we will jump right into the various machine learning algorithms. We can also categorize our tutorials better along functional lines this time.

7 Steps to Understanding Deep Learning

This collection of reading materials and tutorials aims to provide a path for a deep neural networks newcomer to gain some understanding of this vast and complex topic. Though I do not assume any real understanding of neural networks or deep learning, I will assume your familiarity with general machine learning theory and practice to some degree. To overcome any deficiency you may have in the general areas of machine learning theory or practice you can consult the recent KDnuggets post 7 Steps to Mastering Machine Learning With Python. Since we will also see examples implemented in Python, some familiarity with the language will be useful. Introductory and review resources are also available in the previously mentioned post.

7 Steps to Mastering SQL for Data Science

Clearly, SQL is important in data science. As such, this post aims to take a reader from SQL newbie to competent practitioner in a short time, using freely-available online resources. Lots of such resources exist on the internet, but mapping out a path from start to finish, using items which complement each other, is not always as straightforward as it may seem. Hopefully this post can be of assistance in this manner.

7 Steps to Understanding NoSQL Databases

The term NoSQL has come to be synonymous with schema-less, non-relational data storage schemes. NoSQL is an umbrella term, one which encompasses a number of different technologies. These different technologies aren't even necessarily related in any way beyond the single defining characteristic of NoSQL: they are not relational in nature; for right or wrong, Structured Query Language (SQL) has become conflated with relational database management systems over the years.

Kok-Leong Seow, Kseow.com (@KokLeongSeow)

Kok-Leong Seow, a CS graduate student at Columbia University under Hod Lipson, is the youngest on this list. Kok-Leong has worked on creating quantum neural networks and is the creator of KSEOW.com, a leading website that teaches machine learning intuitively.

Kok-Leong takes an extremely different approach in teaching machine learning. Kok-Leong relies heavily on “weird” analogies and uses old methods like the Method of Loci to give intuition and drive concepts home. Kok-Leong is one of the only people that has content that “tells a story” to teach machine learning. The website can be found here.

Andrej Karpathy, karpathy.github.io (@karpathy)

Andrej Karpathy was previously a Research Scientist at OpenAI, and CS graduate student at Stanford University. He is now the director of AI at Tesla, appointed by Elon Musk.

Andrej’s blog, karpathy.github.io, provides good insights on niche subjects in machine learning. It is by no means supposed to serve as an introduction or tutorial on machine learning. However, for more experienced practitioners, his blog can serve to be extremely useful. He even provides other gems like “A Survival Guide to a PhD”. His blog can be found here.

Cursos de NLP

Documentación de Hugging Face: documentación de Transformers y más

Curso de Hugging Face: curso de Transformers. This course:

Requires a good knowledge of Python Is better taken after an introductory deep learning course, such as fast.ai’s Practical Deep Learning for Coders or one of the programs developed by DeepLearning.AI Does not expect prior PyTorch or TensorFlow knowledge, though some familiarity with either of those will help After you’ve completed this course, we recommend checking out DeepLearning.AI’s Natural Language Processing Specialization, which covers a wide range of traditional NLP models like naive Bayes and LSTMs that are well worth knowing about!

Enlaces

Predict Future Sales: This challenge serves as final project for the "How to win a data science competition" Coursera course.
- How to Win a Data Science Competition: Learn from Top Kagglers
- Bayesian Methods for Machine Learning
Learn AI for Free: There’s a skills shortage in AI. Thinking of retraining? Here are some free learning resources to find out if it is for you.This article lists some of the best AI learning materials that are available for free. I’m bound to have missed a few; please let me know your favorites in the comments and why you think they should be included.
How to download Coursera’s courses before they’re gone forever