|
UniverSVMSupport Vector Machine with Large Scale CCCP FunctionalityAuthor: Fabian Sinz, Version: 1.1Collaborators: Ronan Collobert, Jason Weston and Leon BottouOverview
InstallationUnzip the compressed file. If you downloaded the source you have to compile it with one of the following commands:make allfor compiling the learning algorithm and the ascii to binary conversion tool. make universvmfor compiling only the learning algorithm. make libsvm2binfor compiling only the ascii to binary conversion tool. make bin2libsvmfor compiling only the binary to ascii conversion tool. make mexfor compiling univerSVM as MEX file. UsageUniverSVM consists of a single learning module (universvm). This is used both to train a model and to apply the learned model to new examples.If a model file is specified after the trainfile, then the learnt model will be stored there. This is currently not implemented for multi class learning, cross validation and universum variant 1 . At the moment an easy rule is: If the model has only one set of alphas then it can be stored in the model file. If the model is supplied by the switch -F, then UniverSVM will test the specified model on the data that is supplied as training data and the test data supplied by -T. So universvm [options] -T testfile trainfilehas the same effect as doing universvm [options] trainfile model universvm [options] -F model testfileUniverSVM is called with the following parameters: universvm [options] training_set_file [model_file] Available options are:
-T test_set_file: test model on test set
-U universum_file : use universum (it's also possible to include universum points
with label -2 in the training file)
-F model_file : Test the model stored in model_file on training AND test data (specified by -T)
-u unlabeled_data_file : use unlabeled data (transductive SVM).
(it's also possible to include unlabeled points
with label -3 in the training file)
-B file format : files are stored in the following format:
0 -- libsvm ascii format (default)
1 -- binary format
2 -- split file format
-f file : output report file to given destination
-D file : output function values on test set(s) to given destination
OPTIMIZATION OPTIONS:
-V universum variant:
0 -- Standard universum training (default)
1 -- Train SVM with universum by making it a 3-class multiclass
problem and adding the decision rules for {+1,U} vs. -1 and
{-1,U} vs. +1 (0=off default)
This switch works only for binary at the moment.
2 -- Train universum with ramp loss. This option requires "-o 1".
-o optimizer: set different optimizers
0 -- quadratic programm
1 -- convex concave procedure (if you choose a transductive SVM,
this option will be chosen automatically)
-G gap : set gap parameter for universum (default 0.05)
-I use_ridge : Add the ridge 1/C to the kernel matrix.
-r coef0 : set coef0 in kernel function (default 0)
-c cost : set the parameter C of C-SVC (default 1)
-C cost : set the parameter C for universum points
-a cost : set the parameter C for balancing constraint
-z cost : set the parameter C for unlabeled points
-m cachesize : set cache memory size in MB (default 256)
-e epsilon : set tolerance of termination criterion (default 0.001)
-s s : s parameter for ramp loss (default: -1 )
-S s : s parameter for transductive SVM loss (default: 0)
MODEL OPTIONS:
-t kernel_type : set type of kernel function (default 0)
0 -- linear: u'*v
1 -- polynomial: (gamma*u'*v + coef0)^degree
2 -- radial basis function: exp(-gamma*|u-v|^2)
3 -- sigmoid: tanh(gamma*u'*v + coef0)
4 -- custom: k(x_i,x_j) = X(i,j)
-d degree : set degree in kernel function (default 3)
-g gamma : set gamma in kernel function (default 1/k)
-b bias: use constraint sum alpha_i y_i =0 (default 1=on)
-w weight: the rhs of the balancing constraint (default = sum(y_i))
-v n : do cross validation with n folds
-M k : perform a multiclass training on k classes labeled with k different
integers >= 0 (default: 0)
File formatsWe support three types of file format: LIBSVM/SVMLight ascii, binary and split files.
LIBSVM/SVMLight ascii format
<target> .=. +1 | -1 | -3 | <int> <feature> .=. <integer> <value> .=. <float> The target value and each of the feature/value pairs are separated by a space character. In classification mode, the target value denotes the class of the example. +1 as the target value marks a positive example, -1 a negative example respectively. So, for example, the line
specifies a negative example for which feature number 1 has the value 0.43, feature number 3 has the value 0.12, feature number 9284 has the value 0.2, and all the other features have value 0. A class label of -3 indicates that this example should be classified using transduction. In multiclass classification the class should be a positive integer (remember to specify the "-M
Binary files
Split files
file_name: mnist.trn.bin binary_file: 1 supply_indices: 1 supply_new_labels: 1 3 -3 4 -3 7 -3 ...The first line specifies a data file in either ascii or binary format to load. The second line indicates if that file is binary (set to 1) or otherwise (set to 0). The third line specifies whether you wish to load a subset of the given file (set to 1) and the fourth line, "supply_new_labels" indicates whether you wish to relabel the data differently to the original file. Following the first four lines is a list, of either <index> <label> pairs (if supply_new_labels is set to 1) or else only an <index> is given on each line. These indices (starting from 1) specify examples in the original file. ExampleWe give an example of text classification taken from Chapelle and Zien, 2005. You will need the following training, testing and unlabeled data files, available here.1) Running a standard SVM with no unlabeled data (linear kernel): universvm -c 100 -T text.tst1 text.trn1gives
Training done ...
---------------------------------
Testing
---------------------------------
Testing on test set with 1896 examples:
===========================================
Accuracy= 81.0127(1536/1896)
===========================================
2) Running TSVM:
universvm -c 100 -z 0.1 -S -0.3 -u text.tst1 -T text.tst1 text.trn1gives
Training done ...
---------------------------------
Testing
---------------------------------
Testing on test set with 1896 examples:
===========================================
Accuracy= 93.7236(1777/1896)
===========================================
Feedback/Bug ReportsIf you find any bugs or have useful feedback, please send me an email. Please do not forget to attach a detailed description about how to reproduce the bug. |