Table of Contents
The program zcomp constructs a ZDD from a data file that represents a set family and outputs the ZDD as a data file. Conversely, the program unzcomp receives a data file that represents a ZDD and outputs a decompressed set family.
A hypergraph is a pair H=(V,E) of a set V and a family E of subsets of V, where the sets in E are called hyperedges. A Zero-suppressed BDD (ZDD) is a well-accepted data structure for hypergraphs(, where we identify hypergraphs with set families).
CU Decision Diagram Package Release 2.5.0 is used in our program to manipulate ZDDs.
For detailed information, see README
in the directory cudd-2.5.0
and also on-line document.
The requirements for an input file of zcomp and an output file of unzcomp are:
For example:
2 4 7 7 8 9 9 10
Given the data above, our program compresses it into a ZDD and outputs the following data.
2: (~10?1:1) 3: (~9?0:2) 4: (~8?0:1) 5: (~7?3:4) 6: (~7?0:1) 7: (~4?0:6) 8: (~2?5:7)
A sequential list of branch instructions I_{m}, I_{m-1},...,I_{1}, where
This is called Knuth format in our code because it is introduced in his book. See pp.206-207 in "The Art of Computer Programming, Volume 4a (2011)".
Compile ZDD library (i.e. CUDD) first and then compile our code in the top directory.
$ tar zxvf zcomp-1.2.0.tar.gz $ cd zcomp-1.2.0/cudd-2.5.0 $ make $ cd .. $ make
The above method is basically sufficient in Linux.
For MacOSX and Cygwin on Windows, the compilation of CUDD may fail.
In this case, it may be good to modify Makefile
in the directory cudd-2.5.0
so that for MacOSX, all lines concerning XCFLAGS are commented out, and for Cygwin on Windows, the line of XCFLAG in the "Windows 95/98/NT/XP/Vista/7 with Cygwin" section is only uncommented (and other lines are commented).
If compilation fails in spite of Linux, select approapriate XCFLAG according to your environment such as architechture and gcc version.
Usage: ./zcomp [option] input-file [output-file] Compression Method -z zcomp: bottom-up with sorting (default) -n naive Read Order, available only for naive method // this means the order for input to be read -o original order (default) -r random order -f forward order (lexicographic order) -b backward order Output File Format -k Knuth file format (default) -d dot file format Verification -c comparison with native method $./unzcomp input-file [output-file]
If output-filename is not given, zcomp and unzcomp do not produce any datafile.
ZDDs can be visualized: execute zcomp with -d option and convert the output dot file into image file using Graphviz. See Graphviz - Graph Visualization Software.
$ cat data/sample-rand.dat 7 8 2 4 7 9 10 9 $ ./zcomp -d data/sample-rand.dat sample.dot $ dot -Tpng sample.dot -o sample.png
The created DOT file and image file in PNG format.
zcomp and unzcomp do not maintain reference counts of ZDD nodes! Hence, an error will occur if either the number of nodes exceeds the limit size or the memory space available in a computer is exhausted.
zcomp and unzcomp access ZDD library through bdd_interface.h
, in which all functions for ZDD manipulation are defined.
Perhaps, you can easily change to other ZDD libraries, if you like, by modifying bdd_interface.h
and Makefile
in the top directory.
In general, compresssion of data just aims at reducing data sizes.
However, as for ZDD, compression is not an end in itself and can be considered as a first step to later processing.
An important thing is that zcomp makes it easier to efficiently manipulate set families because ZDD offers vairous basic operations for set families and these operations do without explictly decomposing ZDDs, thereby time complexity only depends on ZDD sizes.
This means that whatever size an input set family is, one can efficiently apply set-theoretical operations over ZDDs, as long as intermediate ZDD sizes are in a permissible range during computation.
In order to implement this approach, i.e., application of ZDD operations after compression, it is sufficient to insert such operations in the main function of zcomp.c
.
If the macros MISC_LOG, TIME_LOG, SIZE_LOG are defined in Makefile
, the following information will be shown after execution. The meaning of each line is described as a comment following //.
$ ./zcomp data/sample-rand.dat date Fri May 23 21:09:42 2014 // executed time program ZCOMP-1.2.0 // program name and version package CU Decision Diagram Package Release 2.5.0 // used package input data/sample-rand.dat // input file name method zcomp: bottom-up with sorting // compression method format knuth // output file format generator_type default // selected random number generator first_value 1787880323 // the first generated random number max_value 2147483647 // the maximum value of random numbers seed 1400846982 // seed used to produce a random number maxval 10 // the maximum value of items #row 4 // the number of rows #entry 8 // the number of entries with repetition GC disabled // garbage collection was disabled time(COMP) 0.00 sec // compression time |zdd| 9 // the number of nodes in a ZDD #sets 4 // the number of sets represented by a ZDD uncmp_size 8 // the uncompresssed size of a ZDD
A huge number of sets may be generated in unzcomp and disk space may be exhausted. To avoid disk overflow, take measure such as using ulimit command.