-
- Description:
AMiDST is a Java Toolbox for Scalable Probabilistic Machine Learning.
In AMiDST, you can model your problem using a flexible probabilistic language based on graphical models. Then, fit it with data using a Bayesian approach to handle modelling uncertainty.
AMIDST provides tailored parallel and distributed implementations of Bayesian parameter learning for batch and streaming data (multi-core and distributed processing). This processing is based on flexible and scalable message passing algorithms.
MAIN FEATURES
Probabilistic Graphical Models: You can specify your model using probabilistic graphical models with latent variables and temporal dependencies.
Scalable inference: Perform inference on your probabilistic models with powerful approximate and scalable algorithms based on novel variational message passing schemes.
Data Streams: Update your models when new data is available. This makes our toolbox appropriate for learning from (massive) data streams.
Large-scale Data: Use your defined models to process massive data sets in a distributed computer cluster using Flink or Spark.
Extensible: Code your models or algorithms within AMiDST and expand the toolbox functionalities. Flexible toolbox for researchers performing their experimentation in machine learning.
Interoperability: Leverage existing functionalities and algorithms by interfacing to other software tools such as Hugin, MOA, Weka, R, etc.
PUBLICATIONS: (per year)
2016
[1] Masegosa, A., R., Martinez, A. M. and Borchani, H. (2016). Probabilistic Graphical Models on Multi-Core CPUs Using Java 8. In IEEE Computational Intelligence Magazine, Vol. 11, No. 2., pages 41-54. DOI: 10.1109/mci.2016.2532267.
[2] Masegosa, A. R., Martinez., A. M., Ramos-López, D., Langseth, H., Nielsen, T. D., Salmerón, A., Cabañas, R., and Madsen, A. L. (2016). A Java Toolbox for Analysis of MassIve Data STreams using Probabilistic Graphical Models. Poster. Presented at the European Data Forum.
[3] Salmerón, A. Madsen, A.L., Jensen, F., Langseth, H., Nielsen, T. D., Ramos-López, D., Martinez, A. M., and Masegosa, A., R. (2016). Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block Designs. Accepted for ECAI.
[4] Madsen, A. L., Jensen, F., Salmeron, A., Langseth, H., Nielsen, T. D. (2016). A Parallel Algorithm for Bayesian Network Structure Learning from Large Data Sets. Accepted for Knowledge-Based Systems. VBN. DOI.
[5] Masegosa, A., R., Martinez, A. M., Langseth, H., Nielsen, T. D., Salmeron, A., Ramos-Lopez, D., Madsen, A. L. (2016). d-VMP: Distributed Variational Message Passing. Accepted for PGM. VBN. Online access.
[6] Ramos-Lopez, D., Salmeron, A., Rumi, R., Martinez, A. M., Nielsen, T. D., Masegosa, A., R., Langseth, H., Madsen, A. L. (2016). Scalable MAP inference in Bayesian networks based on a Map-Reduce approach. Accepted for PGM. VBN. Online access.
2015
[1] Salmerón, A, Rumi, R., Langseth, H., Madsen, A. L., Nielsen, T. D. (2015). MPE Inference in Conditional Linear Gaussian Networks. In proceedings of ECSQARU on 15-17 July 2015 in Compiegne, France, pages 407-416. DOI: 10.1007/978-3-319-20807-7.
[2] Madsen, A. L. and Salmerón, A (2015). Analysis of massive data streams using R and AMIDST. In book of abstracts of useR!2015 on 30 June -3 July 2015 in Aalborg, Denmark, page 171.
[3] Borchani, H., Martinez, A. M., Masegosa, A, Langseth, H., Nielsen, T. D., Salmerón, A., Fernández, A., Madsen, A. L., Sáez, R. (2015). Modeling concept drift: A probabilistic graphical model based approach. In proceedings of The Fourteenth International Symposium on Intelligent Data Analysis, 22-24 October 2015 in Saint-Etienne, France, pages 72-83.
[4] Salmerón, A., Ramos-López, D., Borchani, H., Martinez, A. M., Masegosa, A., Fernández, A., Langseth, H., Madsen, A. L., Nielsen, T. D. (2015). Parallel importance sampling in conditional linear Gaussian networks. The XVI Conference of the Spanish Association for Artificial Intelligence (CAEPIA'15), pages 36-46.
[5] Madsen, A. L., Jensen, F., Salmerón, A., Langseth, H., Nielsen, T. D. (2015). Parallelization of the PC Algorithm (2015). The XVI Conference of the Spanish Association for Artificial Intelligence (CAEPIA'15), pages 14-24.
[6] Borchani, H., Martinez, A. M., Masegosa, A, Langseth, H., Nielsen, T. D., Salmerón, A., Fernández, A., Madsen, A. L., Sáez, R (2015). Dynamic Bayesian modeling for risk prediction in credit operations (2015). The 13th Scandinavian Conference on Artificial Intelligence, Halmstad, Sweden, November 5-6, 2015, pages 72-83.
[7] Masegosa, A, Martinez, A. M., Borchani, H., Ramos-Lopez, D., Nielsen, T. D., Langseth, H., Salmerón, Madsen, A. L. (2015). AMIDST: Analysis of MassIve Data STreams (2015). In proceedings of The 27th Benelux Conference on Artificial Intelligence, Hasselt, Belgium, November 5-6, 2015.
- Changes to previous version:
- Added sparklink module implementing the integration with Apache Spark. More information here.
- Fluent pattern in latent-variable-models
- Predefined model implementing the concept drift detection
Detailed information can be found in the toolbox's web page
- BibTeX Entry: Download
- Corresponding Paper BibTeX Entry: Download
- Supported Operating Systems: Platform Independent
- Data Formats: Arff, Json, Parquet
- Tags: Approximate Inference, Bayesian Networks, Data Streams, Multi Core, Bayesian Learning, Hidden Markov Models, Importance Sampling, Maximum Likelihood, Parallelisation, Varational Message Passing, Kalma
- Archive: download here
Comments
No one has posted any comments yet. Perhaps you'd like to be the first?
Leave a comment
You must be logged in to post comments.