Data Science en general
Tutoriales, documentación, libros
Data Science Primer: Basic Concepts for Beginners
This collection of concise introductory data science tutorials cover topics including the difference between data mining and statistics, supervised vs. unsupervised learning, and the types of patterns we can mine from data.
Machine Learning is Fun!
Conjunto de 8 artículos en Medium. Nivel inicial, con buenas explicaciones e intuiciones para los diferentes conceptos.
Update: This article is part of a series. Check out the full series: Part 1, Part 2, Part 3, Part 4, Part 5, Part 6, Part 7 and Part 8!
Bigger update: The content of this article is now available as a full-length video course that walks you through every step of the code. You can take the course for free (and access everything else on Lynda.com free for 30 days) if you sign up with this link.
Have you heard people talking about machine learning but only have a fuzzy idea of what that means? Are you tired of nodding your way through conversations with co-workers? Let’s change that!
This guide is for anyone who is curious about machine learning but has no idea where to start. I imagine there are a lot of people who tried reading the wikipedia article, got frustrated and gave up wishing someone would just give them a high-level explanation. That’s what this is.
Cheat Sheets for AI, Neural Networks, Machine Learning, Deep Learning & Big Data
The Most Complete List of Best AI Cheat Sheets. Over the past few months, I have been collecting AI cheat sheets. From time to time I share them with friends and colleagues and recently I have been getting asked a lot, so I decided to organize and share the entire collection. To make things more interesting and give context, I added descriptions and/or excerpts for each major topic.
Machine Learning for Humans
Simple, plain-English explanations accompanied by math, code, and real-world examples.
[Update 9/2/17] This series is now available as a full-length e-book! Download here.
For inquiries, please contact ml4humans@gmail.com.
Roadmap
- Part 1: Why Machine Learning Matters. The big picture of artificial intelligence and machine learning — past, present, and future.
- Part 2.1: Supervised Learning. Learning with an answer key. Introducing linear regression, loss functions, overfitting, and gradient descent.
- Part 2.2: Supervised Learning II. Two methods of classification: logistic regression and SVMs.
- Part 2.3: Supervised Learning III. Non-parametric learners: k-nearest neighbors, decision trees, random forests. Introducing cross-validation, hyperparameter tuning, and ensemble models.
- Part 3: Unsupervised Learning. Clustering: k-means, hierarchical. Dimensionality reduction: principal components analysis (PCA), singular value decomposition (SVD).
- Part 4: Neural Networks & Deep Learning. Why, where, and how deep learning works. Drawing inspiration from the brain. Convolutional neural networks (CNNs), recurrent neural networks (RNNs). Real-world applications.
- Part 5: Reinforcement Learning. Exploration and exploitation. Markov decision processes. Q-learning, policy learning, and deep reinforcement learning. The value learning problem.
- Appendix: The Best Machine Learning Resources. A curated list of resources for creating your machine learning curriculum.
Articulos de la web Unsupervised Methods
En la web Unsupervised Methods hay varios artículos con recopilaciones de información muy interesantes, "curated lists" de artículos, cursos, bloggers, tutoriales, etc. Estos son los artículos que me han llamado la atención:
- My Curated List of AI and Machine Learning Resources from Around the Web: only include links to free content. There is enough free content to keep you busy for a while. It’s amazing just how much information is available on machine learning, deep learning, and artificial intelligence on the web. This article should give you a sense of the scope. I’ve created sections below that contain: well-known researchers, AI organizations, video courses, bloggers, Medium writers, books, YouTube channels, Quora topics, subreddits, Github repos, podcasts, newsletters, conferences, research links, tutorials, and cheat sheets.
- Over 150 of the Best Machine Learning, NLP, and Python Tutorials I’ve Found: a list of the best tutorial content that I’ve found so far. It’s by no means an exhaustive list of every ML-related tutorial on the web — that would be overwhelming and duplicative. Plus, there is a bunch of mediocre content out there. My goal was to link to the best tutorials I found on the important subtopics within machine learning and NLP. I’ve split this post into four sections: Machine Learning, NLP, Python, and Math.
- Cheat Sheet of Machine Learning and Python (and Math) Cheat Sheets: There are many facets to Machine Learning. As I started brushing up on the subject, I came across various “cheat sheets” that compactly listed all the key points I needed to know for a given topic. Eventually, I compiled over 20 Machine Learning-related cheat sheets. Some I reference frequently and thought others may benefit from them too. This post contains 27 of the better cheat sheets I’ve found on the web. Let me know if I’m missing any you like. Given how rapidly the Machine Learning space is evolving, I imagine these will go out of date quickly, but at least as of June 1, 2017, they are pretty current.
Libros
- An Introduction to Statistical Learning with Applications in R
- ISLR-python: This repository contains Python code for a selection of tables, figures and LAB sections from the book 'An Introduction to Statistical Learning with Applications in R' by James, Witten, Hastie, Tibshirani (2013).
Enlaces
- (21/02/22) KDnuggets - The Complete Collection of Data Science Cheat Sheets:
- 3Blue1Brown: Linear Algebra: lista de reproducción sobre algebra lineal, explica muy bien los conceptos geométricos asociados a las operaciones
- 3Blue1Brown: 3blue1brown, or 3b1b for those who prefer less of a tongue-twister, centers around presenting math with a visuals-first approach.
- Build your own Robust Deep Learning Environment in Minutes: this guide is meant to ease you into racing through the less desirable aspects of setting up your own deep learning environment.
- Data Versioning
- Haystacks - A data science blog by Caitlin Hudon
- Learn Data Science: Data analytics and machine learning tutorials created by data scientists.
- EXXETA data science and machine learning blog:
- Getting to Know Keras for New Data Scientists: Keras is a powerful and easy-to-use Python library for developing and evaluating deep learning models. In this article, we’ll lay out the welcome mat to the framework. You should walk away with a handful of useful features to keep in mind as you get up to speed.
- Linear Regression in 6 lines of Python
- Pete Warden: tiene un blog bastante bueno con opiniones interesantes sobre temas de machine learning, últimamente habla de machine learning en dispositivo embebidos. También es el autor de dos portales: uno con herramientas de open data y otro para hacer heat maps a partir de archivos csv
- Pete Warden's blog
- Data Science Toolkit: tiene todo el código en GitHub y el site se puede clonar en tu propia maquina virtual
- OpenHeatMap
- Machine Learning Zero-to-Hero: Everything you need in order to compete on Kaggle for the first time, step-by-step!
- Object detection: an overview in the age of Deep Learning
- Essential Cheat Sheets for Machine Learning and Deep Learning Engineers: tiene también un repositorio Github
- Cheat Sheets for AI, Neural Networks, Machine Learning, Deep Learning & Big Data: cheat sheets para topologías de redes neuronales, numpy, scikit-learn, python basics for data science, pandas, bokeh, tensorflow, keras, data wrangling con pandas, data wrangling con dplyr y tidyr, algebra lineal con scipy, matplotlib, visualización de datos con ggplot2, complejidad de algoritmos, etc.
- Probabilistic programming from scratch: programación probabilística, inferencia Bayesiana
- Jake Vanderplas - Statistics for Hackers - PyCon 2016
- ¿Pedaleas en la ciudad?: Analiza con Excel la seguridad de los ciclistas en Madrid: artículo del blog de LUCA en el que se hace un análisis descriptivo de datos de accidentes de bicicleta descargados del portal de datos abiertos del ayuntamiento de Madrid. El análisis se hace con Excel, y hay información sobre características y funciones avanzadas de las tablas dinámicas
- Tus datos más limpios...(II). Excel, "Waterproof": otroartículo del blog de LUCA, en este se realiza un tratamiento y limpieza de datos usando Excel
- Tus datos más limpios, casi sin frotar: artículo un poco flojillo del blog de LUCA sobre el problema de la limpieza de datos pero en el que se mencionan las herramientas de limpieza de datos OpenRefine, Trifacta Wrangler y DataCleaner. OpenRefine es una herramienta open source que se denominaba anteriormente Google Refine
- Presto
- (23/05/21) Automated Data Wrangling: A growing array of techniques apply machine learning directly to the problems of data wrangling. They often start out as open research projects but then become proprietary. How can we build automated data wrangling systems for open data?