-
- Description:
A thin Python wrapper that uses the javabridge Python library to communicate with a Java Virtual Machine executing Weka API calls. Offers all major APIs, like data generators, loaders, savers, filters, classifiers, clusterers, attribute selection, associations and experiments. Weka packages can be listed/installed/uninstalled as well. It does not provide any graphical frontend, but some basic plotting and graph visualizations are available through matplotlib and pygraphviz. A simple workflow engine was added with release 0.3.0.
- Changes to previous version:
- upgraded to Weka 3.9.2
- properly initializing package support now, rather than adding package jars to classpath
- added weka.core.ClassHelper Java class for obtaining classes and static fields, as javabridge only uses the system class loader
- BibTeX Entry: Download
- Supported Operating Systems: Agnostic
- Data Formats: Arff, Csv, Libsvm, Xrff
- Tags: Machine Learning, Weka
- Archive: download here
Other available revisons
-
Version Changelog Date 0.3.12 - upgraded to Weka 3.9.2
- properly initializing package support now, rather than adding package jars to classpath
- added weka.core.ClassHelper Java class for obtaining classes and static fields, as javabridge only uses the system class loader
February 18, 2018, 04:29:24 0.3.11 - added check_for_modified_class_attribute method to FilterClassifier class
- added complete_classname method to weka.core.classes module, which allows completion of partial classnames like .J48 to weka.classifiers.trees.J48; if there is a unique match; JavaObject.new_instance and JavaObject.check_type now make use of this functionality, allowing for instantiations like Classifier(cls=".J48")
- jvm.start(system_cp=True) no longer fails with a KeyError: 'CLASSPATH' if there is no CLASSPATH environment variable defined
- Libraries mtl.jar, core.jar and arpack_combined_all.jar were added as is to the weka.jar in the 3.9.1 release instead of adding their content to it. Repackaged weka.jar to fix this issue.
August 23, 2017, 01:17:24 0.3.10 - "types.double_matrix_to_ndarray" no longer assumes a square matrix (https://github.com/fracpete/python-weka-wrapper/issues/48)
- "len(Instances)" now returns the number of rows in the dataset (module "weka.core.dataset")
- added method "insert_attribute" to the "Instances" class
- added class method "create_relational" to the "Attribute" class
- upgraded Weka to 3.9.1
January 4, 2017, 10:21:33 0.3.9 - plot_learning_curve method of module weka.plot.classifiers now accepts a list of test sets; * is index of test set in label template string
- added missing_value() methods to weka.core.dataset module and Instance class
- output variable y for convenience method create_instances_from_lists in module weka.core.dataset is now optional
- added convenience method create_instances_from_matrices to weka.core.dataset module to easily create an Instances object from numpy matrices (x and y)
October 18, 2016, 22:55:00 0.3.8 - works now with javabridge 1.0.14
May 9, 2016, 04:17:42 0.3.7 - upgraded Weka to 3.9.0
May 4, 2016, 00:02:47 0.3.6 - Loader.load_file method now checks whether the dataset file really exists, otherwise a previously loaded file gets loaded again without an error message (seems to be a Weka issue)
- replaced org.pentaho.packageManagement with weka.core.packageManagement as the package management code is now part of Weka rather than a third-party library
- jvm.start() no longer tries to load packages and therefore suppresses error message if $HOME/wekafiles/packages should not yet exist
April 2, 2016, 11:57:09 0.3.5 - added support for weka.core.BatchPredictor to class Classifier in module weka.classifiers
- upgraded Weka to revision 12410 (post 3.7.13) to avoid performance bottleneck when using setOptions method
- fixed class SetupGenerator from module weka.core.classes
- added load_any_file method to the weka.core.converters module
- added save_any_file method to the weka.core.converters module
- if GridSearch instantiation (module weka.classifiers) fails, it now outputs message whether package installed and JVM with package support started
January 29, 2016, 05:22:58 0.3.4 - added convenience method create_instances_from_lists to weka.core.dataset module to easily create an Instances object from numeric lists (x and y)
- added get_object_tags method to Tags class from module weka.core.classes, to allow obtaining weka.core.Tag array from the method of a JavaObject rather than a static field (MultiSearch)
- updated MultiSearch wrapper in module weka.classifiers to work with the multi-search package version 2016.1.15 or later
January 29, 2016, 05:21:27 0.3.3 - updated to Weka 3.7.13
- documentation now covers the API as well
September 26, 2015, 06:11:42 0.3.2 - The "packages" parameter of the "weka.core.jvm.start()" function can be used for specifying an alternative Weka home directory now as well
- added "train_test_split" method to "weka.core.Instances" class to easily create train/test splits
- "evaluate_train_test_split" method of "weka.classifiers.Evaluation" class now uses the "train_test_split" method
June 28, 2015, 23:09:13 0.3.1 - added "get_tags" class method to "Tags" class for easier instantiation of Tag arrays
- added "find" method to "Tags" class to locate "Tag" object that matches the string
- fixed "getitem" and "setitem" methods of the "Tags" class
- added "GridSearch" meta-classifier with convenience properties to module "weka.classifiers"
- added "SetupGenerator" and various parameter classes to "weka.core.classes"
- added "MultiSearch" meta-classifier with convenience properties to module "weka.classifiers"
- added "quote"/"unquote" and "backquote"/"unbackquote" methods to "weka.core.classes" module
- added "main" method to "weka.core.classes" for operations on options: join, split, code
- added support for option handling to "weka.core.classes" module
April 23, 2015, 00:06:57 0.3.0 - added method "ndarray_to_instances" to "weka.converters" module for converting Numpy 2-dimensional array into "Instances" object
- added method "plot_learning_curve" to "weka.plot.classifiers" module for creating learning curves for multiple classifiers for a specific metric
- added plotting of experiments with "plot_experiment" methid in "weka.plot.experiments" module
- "Instance.create_instance" method now takes list of tuples (index, internal float value) when generating sparse instances
- added "weka.core.database" module for loading data from a database
- added "make_copy" class method to "Clusterer" class
- added "make_copy" class method to "Associator" class
- added "make_copy" class method to "Filter" class
- added "make_copy" class method to "DataGenerator" class
- most classes (like Classifier and Filter) now have a default classname value in the constructor
- added "TextDirectoryLoader" class to "weka.core.converters"
- moved all methods from "weka.core.utils" to "weka.core.classes"
- fixed "Attribute.index_of" method for determining label index
- fixed "Attribute.add_string_value" method (used incorrect JNI parameter)
- "create_instance" and "create_sparse_instance" methods of class "Instance" now ensure that list values are float
- added "to_help" method to "OptionHandler" class which outputs a help string generated from the base class's "globalInfo" and "listOptions" methods
- fixed "test_model" method of "Evaluation" class when supplying a "PredictionOutput" object (previously generated "No dataset structure provided!" exception)
- added "batch_finished" method to "Filter" class for incremental filtering
- added "line_plot" method to "weka.plot.dataset" module for plotting dataset using internal format (one line plot per instance)
- added "is_serializable" property to "JavaObject" class
- added "has_class" convenience property to "Instance" class
- added "repr" method to "JavaObject" classes (simply calls "toString()" method)
- added "Stemmer" class in module "weka.core.stemmers"
- added "Stopwords" class in module "weka.core.stopwords"
- added "Tokenizer" class in module "weka.core.tokenizers"
- added "StringToWordVector" filter class in module "weka.filters"
- added simple workflow engine (see documentation on Flow)
April 15, 2015, 12:37:22 0.2.2 - added convenience methods "no_class" (to unset class) and "has_class" (class set?) to "Instances" class
- switched to using faster method objects for methods "classify_instance"/"distribution_for_instance" in "Classifier" class
- switched to using faster method objects for methods "cluster_instance"/"distribution_for_instance" in "Clusterer" class
- switched to using faster method objects for methods "class_index", "is_missing", "get/set_value", "get/set_string_value", "weight" in "Instance" class
- switched to using faster method objects for methods "input", "output", "outputformat" in "Filter" class
- switched to using faster method objects for methods "attribute", "attribute_by_name", "num_attributes", "num_instances", "class_index", "class_attribute", "set_instance", "get_instance", "add_instance" in "Instances" class
January 5, 2015, 03:43:56 0.2.1 - added unit testing framework
- added method "refresh_cache()" to "weka/core/packages.py" to allow user to refresh local cache
- method "get_classname" in "weka.core.utils" now handles Python objects and class objects as well
- added convenience method "get_jclass" to "weka.core.utils" to instantiate a Java class
- added a "JavaArray" wrapper for "arrays, which allows getting/setting elements and iterating
- added property "classname" to class "JavaObject" for easy access to classname of underlying object
- added class method "parse_matlab" for parsing Matlab matrix strings to "CostMatrix" class
- "predictions" method of "Evaluation" class now return "None" if predictions are discarded
- "Associator.get_capabilities()" method is now a property: "Associator.capabilities"
- added wrapper classes for Java enums: "weka.core.classes.Enum"
- fixed retrieval of "sumSq" in "Stats" class (used by "AttributeStats")
- fixed "cluster_instance" method in "Clusterer" class
- fixed "filter" and "clusterer" properties in clusterer classes ("SingleClustererEnhancer", "FilteredClusterer")
- added "crossvalidate_model" method to "ClusterEvaluation"
- added "get_prc" method to "plot/classifiers.py" for calculating the area under the precision-recall curve
- "Filter.filter" method now handles list of "Instances" objects as well, applying the filter sequentially to all the datasets (allows generation of compatible train/test sets)
January 5, 2015, 00:30:01 0.2.0 NB: This release is not backwards compatible!
- requires "JavaBridge" 1.0.9 at least
- moved from Java-like get/set ("getIndex()" and "setIndex(int)") to nicer Python properties
- using Python properties (also only read-only ones) wherevere possible
- added "weka.core.version" for accessing the Weka version currently in use
- added "jwrapper" and "jclasswrapper" methods to "JavaObject" class (the mother of all objects in python-weka-wrapper) to allow generic access to an object's methods: http://pythonhosted.org//javabridge/highlevel.html#wrapping-java-objects-using-reflection
- added convenience methods "class_is_last()" and "class_is_first()" to "weka.core.Instances" class
- added convenience methods "delete_last_attribute()" and "delete_first_attribute()" to "weka.core.Instances" class
December 22, 2014, 09:21:53 0.1.17 - fixed "setup.py" to download Weka 3.7.12 instead of 3.7.11 (this time correct URL)
December 17, 2014, 21:43:28 0.1.16 -
fixed
setup.py
to download Weka 3.7.12 instead of 3.7.11
December 17, 2014, 21:38:31 0.1.15 - fixed "Instance.set_value" method: https://github.com/fracpete/python-weka-wrapper/issues/24
- added sub-section "From source" to section on installing the library
- upgraded to Weka 3.7.12
December 17, 2014, 21:11:43 0.1.14 - fixed setup.py to include the jars again when using eggs (via include_package_data etc)
- added detailed instructions for installing the library
December 16, 2014, 01:58:17 0.1.13 - added "get_class" method to "weka.core.utils" which returns the Python class object associated with the classname in dot-notation
- "from_commandline" method in "weka.core.utils" now takes an optional "classname" argument, which is the classname (in dot-notation) of the wrapper class to return - instead of the generic "OptionHandler"
- added "Kernel" and "KernelClassifier" convenience classes to better handle kernel based classifiers
November 1, 2014, 09:59:54 0.1.12 - added "create_string" class method to the "Attribute" class for creating a string attribute
- ROC/PRC curves can now consist of multiple plots (ie multiple class labels)
- switched command-line option handling from "getopt" to "argparse"
- fixed Instance.get_dataset(self) method
- added iterators for: rows/attributes in dataset, values in dataset row
- incremental loaders can be iterated now
October 17, 2014, 00:16:26 0.1.11 - moved wekaexamples module to separate github project: https://github.com/fracpete/python-weka-wrapper-examples
- added "stratify", "train_cv" and "test_cv" methods to the Instances class
- fixed "to_summary" method of the Evaluation class: failed when providing a custom title
September 25, 2014, 00:39:02 0.1.10 - fixed adding custom classpath using jvm.start(class_path=[...])
August 29, 2014, 05:00:14 0.1.9 - added static methods to Instances class: summary, merge_instances, append_instances
- added methods to Instances class: delete_with_missing, equal_headers
August 29, 2014, 04:58:45 0.1.8 - fixed installer: MANIFEST.in now includes CHANGES.rst and DESCRIPTION.rst as well
June 26, 2014, 02:38:12 0.1.7 - fixed weka/plot/dataset.py imports to avoid error when testing for matplotlib availability
- Instance.create_instance (weka/core/dataset.py) now accepts Python list and Numpy array
June 26, 2014, 02:14:16 0.1.6 - added troubleshooting section for Mac OSX to documentation
- recompiled helper jars with 1.6 rather than 1.7
- added link to Google Group
May 29, 2014, 06:02:41 0.1.5 - added CostMatrix support in the classifier evaluation
- fixed various retrievals of double arrays (accessed them incorrectly as float arrays), like distributionForInstance for a classifier
- Instances object can now retrieve all (internal) values of an attribute/column as numpy array
- added plotting of cluster assignments to weka.plot.clusterers
- fixed weka.core.utils.from_commandline method
- fixed weka.classifiers.PredictionOutput (get/set_header methods)
- predictions can be turned into an Instances object now using weka.classifiers.predictions_to_instances
May 23, 2014, 06:40:42 0.1.4 - dependencies for plotting are now optional (pygraphviz, PIL, matplotlib)
- plots now support custom titles
May 19, 2014, 03:02:32 0.1.3 - improved documentation
- added PRC curve plot
- aligned to PEP8 style guidelines
- fixed variety of little bugs (not so commonly used methods)
- fixed lib directory reference in make files for Java helper classes
May 17, 2014, 13:37:54 0.1.2 - added matrix plot
- added scatter plot for two attributes
- fixes in constructors of classes
- added MultiFilter convenience class
- predictions (of classifiers) can now be collected and output using the PredictionOutput class
- added support for attribute statistics
May 13, 2014, 07:11:07 0.1.1 - constructors now take list of commandline options as well
- added Weka package support (list/install/uninstall)
- ROC plotting for classifiers
- improved code documentation
- added more examples
- added more datasets
- using javabridge 1.0.1 now
May 2, 2014, 03:35:38 0.1.0 Initial Announcement on mloss.org.
April 28, 2014, 23:18:48
Comments
No one has posted any comments yet. Perhaps you'd like to be the first?
Leave a comment
You must be logged in to post comments.