Top 8 R Machine Learning Projects

Project mention: Using dictionaries to check text in R, language processing  reddit.com/r/rstats  20210818
TextFeatures can be used to create a summary table of occurrences of text features like number of unique words, etc. Don't know if it does nouns or verbs.

Project mention: [D] Selecting Hyperparameters Using Bayesian Optimization  reddit.com/r/statistics  20210303
Disclaimer: I am the maintainer of ParBayesianOptimization. That readme has a pretty good walkthrough of how Bayesian optimization works.



I developed miceRanger because the mice package uses a really slow implementation of random forests. It has a bunch of plotting capabilities and can impute new datasets without retraining the models used in the mice procedure.

lmtp
:package: Nonparametric Causal Effects of Feasible Interventions Based on Modified Treatment Policies :crystal_ball:
Project mention: [Q] Should Gmethods, IPTW always be used over traditional regression?  reddit.com/r/statistics  20210912The tlverse/sl3 super learner library is much better integrated and a lot more powerful (a bit more complicated in the beginning but once you understand it, its great). LMTP has a separate branch that uses sl3: https://github.com/ntwilliams/lmtp/tree/sl3devel. To specify formulas is sl3, you just do Lrnr_glmnet$new(formula = ~ 1 + W + A + A*W), but make sure to download the "dev" version: devtools::install_github("tlverse/sl3", ref = "devel").

tmle3mopttx
🎯 💯 Targeted Learning and Variable Importance for the Causal Effect of an Optimal Individualized Treatment Intervention
Project mention: [D] Is there a such thing as "Prespective Statistical Models"?  reddit.com/r/statistics  20210921This package and the references therein allows for nonparametric estimation and inference for the optimal dynamic treatment: https://github.com/tlverse/tmle3mopttx.

causalglm
Interpretable and modelrobust causal inference for heterogeneous treatment effects using generalized linear working models with targeted machinelearning
Project mention: [Q] Sensitivity of (Causal) Inference to Nonlinear Functional Form  reddit.com/r/statistics  20210928Why not both? https://tlverse.org/causalglm/ (Will replace this with a more informative comment when I have free time later today)

Scout APM
Scout APM: A developer's best friend. Try free for 14days. Scout APM uses tracing logic that ties bottlenecks to source code so you know the exact line of code causing performance issues and can get back to building a great product faster.

Project mention: ggshakeR  R’s first allinone data analysis and visualization package on open soccer data  reddit.com/r/rstats  20211018
Index
What are some of the best opensource Machine Learning projects in R? This list will help you:
Project  Stars  

1  textfeatures  151 
2  ParBayesianOptimization  69 
3  mlr3learners  60 
4  miceRanger  38 
5  lmtp  25 
6  tmle3mopttx  7 
7  causalglm  6 
8  ggshakeR  5 
Are you hiring? Post a new remote job listing for free.