Skip to content

Commit

Permalink
MAINT bump version for new release
Browse files Browse the repository at this point in the history
  • Loading branch information
mfeurer committed Feb 6, 2019
1 parent 8b3d823 commit 6771f4c
Show file tree
Hide file tree
Showing 3 changed files with 46 additions and 51 deletions.
9 changes: 2 additions & 7 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,13 @@
What's New in LIAC-ARFF
~~~~~~~~~~~~~~~~~~~~~~~

LIAC-ARFF 2.3.2
* fix: match all possible separator spaces to add quotes when encoding into
ARFF. These separator spaces will be preserved when decoding the ARFF files.

LIAC-ARFF 2.4

* enhancement: load data progressively with generator `return_type`.

LIAC-ARFF 2.4

* enhancement: standard Java escape sequences are now decoded in string
attributes, and non-printable characters are now encoded with escaping.
* fix: match all possible separator spaces to add quotes when encoding into
ARFF. These separator spaces will be preserved when decoding the ARFF files.

LIAC-ARFF 2.3.1

Expand Down
82 changes: 41 additions & 41 deletions arff.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,48 +28,48 @@
'''
The liac-arff module implements functions to read and write ARFF files in
Python. It was created in the Connectionist Artificial Intelligence Laboratory
(LIAC), which takes place at the Federal University of Rio Grande do Sul
(LIAC), which takes place at the Federal University of Rio Grande do Sul
(UFRGS), in Brazil.
ARFF (Attribute-Relation File Format) is an file format specially created for
describe datasets which are commonly used for machine learning experiments and
softwares. This file format was created to be used in Weka, the best
softwares. This file format was created to be used in Weka, the best
representative software for machine learning automated experiments.
An ARFF file can be divided into two sections: header and data. The Header
describes the metadata of the dataset, including a general description of the
dataset, its name and its attributes. The source below is an example of a
An ARFF file can be divided into two sections: header and data. The Header
describes the metadata of the dataset, including a general description of the
dataset, its name and its attributes. The source below is an example of a
header section in a XOR dataset::
%
%
% XOR Dataset
%
%
% Created by Renato Pereira
% rppereira@inf.ufrgs.br
% http://inf.ufrgs.br/~rppereira
%
%
%
%
@RELATION XOR
@ATTRIBUTE input1 REAL
@ATTRIBUTE input2 REAL
@ATTRIBUTE y REAL
The Data section of an ARFF file describes the observations of the dataset, in
The Data section of an ARFF file describes the observations of the dataset, in
the case of XOR dataset::
@DATA
0.0,0.0,0.0
0.0,1.0,1.0
1.0,0.0,1.0
1.0,1.0,0.0
%
%
%
%
%
%
Notice that several lines are starting with an ``%`` symbol, denoting a
Notice that several lines are starting with an ``%`` symbol, denoting a
comment, thus, lines with ``%`` at the beginning will be ignored, except by the
description part at the beginning of the file. The declarations ``@RELATION``,
description part at the beginning of the file. The declarations ``@RELATION``,
``@ATTRIBUTE``, and ``@DATA`` are all case insensitive and obligatory.
For more information and details about the ARFF file description, consult
Expand All @@ -79,29 +79,29 @@
ARFF Files in Python
~~~~~~~~~~~~~~~~~~~~
This module uses built-ins python objects to represent a deserialized ARFF
This module uses built-ins python objects to represent a deserialized ARFF
file. A dictionary is used as the container of the data and metadata of ARFF,
and have the following keys:
- **description**: (OPTIONAL) a string with the description of the dataset.
- **relation**: (OBLIGATORY) a string with the name of the dataset.
- **attributes**: (OBLIGATORY) a list of attributes with the following
- **attributes**: (OBLIGATORY) a list of attributes with the following
template::
(attribute_name, attribute_type)
the attribute_name is a string, and attribute_type must be an string
or a list of strings.
- **data**: (OBLIGATORY) a list of data instances. Each data instance must be
- **data**: (OBLIGATORY) a list of data instances. Each data instance must be
a list with values, depending on the attributes.
The above keys must follow the case which were described, i.e., the keys are
case sensitive. The attribute type ``attribute_type`` must be one of these
strings (they are not case sensitive): ``NUMERIC``, ``INTEGER``, ``REAL`` or
``STRING``. For nominal attributes, the ``atribute_type`` must be a list of
The above keys must follow the case which were described, i.e., the keys are
case sensitive. The attribute type ``attribute_type`` must be one of these
strings (they are not case sensitive): ``NUMERIC``, ``INTEGER``, ``REAL`` or
``STRING``. For nominal attributes, the ``atribute_type`` must be a list of
strings.
In this format, the XOR dataset presented above can be represented as a python
In this format, the XOR dataset presented above can be represented as a python
object as::
xor_dataset = {
Expand Down Expand Up @@ -133,7 +133,7 @@
and lists of dictionaries as used by SVMLight
- Supports the following attribute types: NUMERIC, REAL, INTEGER, STRING, and
NOMINAL;
- Has an interface similar to other built-in modules such as ``json``, or
- Has an interface similar to other built-in modules such as ``json``, or
``zipfile``;
- Supports read and write the descriptions of files;
- Supports missing values and names with spaces;
Expand All @@ -146,7 +146,7 @@
__author_email__ = ('renato.ppontes@gmail.com, '
'feurerm@informatik.uni-freiburg.de, '
'joel.nothman@gmail.com')
__version__ = '2.3.1'
__version__ = '2.4.0'

import re
import sys
Expand Down Expand Up @@ -344,7 +344,7 @@ def __init__(self, value):
)

class BadAttributeType(ArffException):
'''Error raised when some invalid type is provided into the attribute
'''Error raised when some invalid type is provided into the attribute
declaration.'''
message = 'Bad @ATTRIBUTE type, at line %d.'

Expand All @@ -361,7 +361,7 @@ def __init__(self, value, value2):
)

class BadNominalValue(ArffException):
'''Error raised when a value in used in some data instance but is not
'''Error raised when a value in used in some data instance but is not
declared into it respective attribute declaration.'''

def __init__(self, value):
Expand All @@ -381,7 +381,7 @@ def __init__(self, value):
)

class BadNumericalValue(ArffException):
'''Error raised when and invalid numerical value is used in some data
'''Error raised when and invalid numerical value is used in some data
instance.'''
message = 'Invalid numerical value, at line %d.'

Expand Down Expand Up @@ -676,7 +676,7 @@ def _decode_comment(self, s):
characters.
This method must receive a normalized string, i.e., a string without
padding, including the "\r\n" characters.
padding, including the "\r\n" characters.
:param s: a normalized string.
:return: a string with the decoded comment.
Expand All @@ -687,13 +687,13 @@ def _decode_comment(self, s):
def _decode_relation(self, s):
'''(INTERNAL) Decodes a relation line.
The relation declaration is a line with the format ``@RELATION
The relation declaration is a line with the format ``@RELATION
<relation-name>``, where ``relation-name`` is a string. The string must
start with alphabetic character and must be quoted if the name includes
spaces, otherwise this method will raise a `BadRelationFormat` exception.
This method must receive a normalized string, i.e., a string without
padding, including the "\r\n" characters.
padding, including the "\r\n" characters.
:param s: a normalized string.
:return: a string with the decoded relation name.
Expand All @@ -710,26 +710,26 @@ def _decode_relation(self, s):
def _decode_attribute(self, s):
'''(INTERNAL) Decodes an attribute line.
The attribute is the most complex declaration in an arff file. All
The attribute is the most complex declaration in an arff file. All
attributes must follow the template::
@attribute <attribute-name> <datatype>
where ``attribute-name`` is a string, quoted if the name contains any
where ``attribute-name`` is a string, quoted if the name contains any
whitespace, and ``datatype`` can be:
- Numerical attributes as ``NUMERIC``, ``INTEGER`` or ``REAL``.
- Strings as ``STRING``.
- Dates (NOT IMPLEMENTED).
- Nominal attributes with format:
{<nominal-name1>, <nominal-name2>, <nominal-name3>, ...}
{<nominal-name1>, <nominal-name2>, <nominal-name3>, ...}
The nominal names follow the rules for the attribute names, i.e., they
must be quoted if the name contains whitespaces.
This method must receive a normalized string, i.e., a string without
padding, including the "\r\n" characters.
padding, including the "\r\n" characters.
:param s: a normalized string.
:return: a tuple (ATTRIBUTE_NAME, TYPE_OR_VALUES).
Expand Down Expand Up @@ -918,8 +918,8 @@ def _encode_comment(self, s=''):
def _encode_relation(self, name):
'''(INTERNAL) Decodes a relation line.
The relation declaration is a line with the format ``@RELATION
<relation-name>``, where ``relation-name`` is a string.
The relation declaration is a line with the format ``@RELATION
<relation-name>``, where ``relation-name`` is a string.
:param name: a string.
:return: a string with the encoded relation declaration.
Expand All @@ -945,7 +945,7 @@ def _encode_attribute(self, name, type_):
- Dates (NOT IMPLEMENTED).
- Nominal attributes with format:
{<nominal-name1>, <nominal-name2>, <nominal-name3>, ...}
{<nominal-name1>, <nominal-name2>, <nominal-name3>, ...}
This method must receive a the name of the attribute and its type, if
the attribute type is nominal, ``type`` must be a list of values.
Expand Down Expand Up @@ -978,7 +978,7 @@ def encode(self, obj):
def iter_encode(self, obj):
'''The iterative version of `arff.ArffEncoder.encode`.
This encodes iteratively a given object and return, one-by-one, the
This encodes iteratively a given object and return, one-by-one, the
lines of the ARFF file.
:param obj: the object containing the ARFF information.
Expand Down Expand Up @@ -1042,7 +1042,7 @@ def iter_encode(self, obj):
# BASIC INTERFACE =============================================================
def load(fp, encode_nominal=False, return_type=DENSE):
'''Load a file-like object containing the ARFF document and convert it into
a Python object.
a Python object.
:param fp: a file-like object.
:param encode_nominal: boolean, if True perform a label encoding
Expand Down Expand Up @@ -1077,7 +1077,7 @@ def loads(s, encode_nominal=False, return_type=DENSE):
return_type=return_type)

def dump(obj, fp):
'''Serialize an object representing the ARFF document to a given file-like
'''Serialize an object representing the ARFF document to a given file-like
object.
:param obj: a dictionary.
Expand Down
6 changes: 3 additions & 3 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,16 @@
__author_email__ = ('renato.ppontes@gmail.com, '
'feurerm@informatik.uni-freiburg.de, '
'joel.nothman@gmail.com')
__version__ = '2.3.1'
__date__ = '2018 07 16'
__version__ = '2.4.0'
__date__ = '2019 02 06'

try:
import setuptools
except ImportError:
from ez_setup import use_setuptools
use_setuptools()

from setuptools import setup, find_packages
from setuptools import setup

try:
f = open('README.rst','rU')
Expand Down

0 comments on commit 6771f4c

Please sign in to comment.