SIRUTAlib - a library for querying the SIRUTA database

This project aims to create a library that can import a SIRUTA database and offer simple access to the different elements of the database.

What is SIRUTA?

SIRUTA is the official clasification of the Romanian towns and villages (hereafter called entity). It is maintained by the National Statistics Institute.

It gives every entity a 6 digit code (5-digit unique code and 1-digit checksum). The whole classification is hierachical, with Romania (the country) as root, then 40 counties + Bucharest. Bucharest contains the city of Bucharest, which in turn contains 6 sectors. Every county has municipalities, citiess and communes, and each of those is comprised of towns and villages.

The SIRUTA archives contain detailed documentation about the whole classification, including the algoritm for the checksum.

Note

This library makes the assumption that SIRUTA codes shorter than 6 characters are filled with 0 to the left in order to calculate the checksum. There are 77 codes that do not respect this assumption. Out of those, 76 can be calculated if the code is filled with 0’s to the right. The remaining code is 9026.

Getting the library

You can either download the tar file (mirror) or get the source code, as described in the Development section.

In both cases, you will also get a copy of the most recent SIRUTA database in CSV format.

Development

Dependencies

  • a recent version of python is required in order to develop with SIRUTAlib
  • this library uses Git for source control, so you’ll need that if you want to get the full source code.
  • if you want to build the help files, you’ll also need sphinx and make (the latter is optional)

Getting the source

To work on the SIRUTAlib code, you only need a local repository checkout:

$ git clone https://github.com/strainu/SIRUTA.git
$ cd siruta
You will find 2 python files:
  • sirutalib.py contains the actual library
  • testsiruta.py contains the tests needed to check the code.

That’s it, enjoy!

Using the library

A simple usage example is available in the INSTALL file.

Contributing

If you plan to contribute code to SIRUTAlib, please keep a few things in mind:
  • code should be formatted according to PEP 8
  • tests should be written for all the new code, as long as you don’t need to change class internals to test it

Then prepare a patch and submit a pull request on github.

For more contact options, see Feedback and contact.

Feedback and contact

You can register a bug, feature request or pull request on github: https://github.com/strainu/SIRUTA/issues

If you want to contact the author, you can do it by emailing siruta [at] strainu.ro. All the latest information is available on the project’s page.

Installation

In order to install SIRUTAlib, you just need to extract the archive and run the following command (usually as root):

# python setup.py install

Then, you can just import sirutalib in your program:

#!/usr/bin/python
import sirutalib

if __name__ == "__main__":
    siruta = sirutalib.SirutaDatabase()
    print siruta.get_name(10)#10 is the SIRUTA code for Alba county

Modules

sirutalib

Library created to parse a SIRUTA CSV extract and allow simple access to the resulting database

exception sirutalib.SirutaCodeWarning

Bases: builtins.UserWarning

This class defines a new type of warning, specific for SIRUTA codes with errors. It is used solely to uniquely identify the warnings thrown by this module.

args
with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

class sirutalib.SirutaDatabase(filename='siruta.csv', enforce_warnings=False)

Bases: builtins.object

The main class, representing the SIRUTA database.

It reads data from a CSV file. The expected input format is: SIRUTA;DENLOC;CODP;JUD;SIRSUP;TIP;NIV;MED;REGIUNE;FSJ;FS2;FS3;FSL;rang;fictiv

Documentation for these fiels can be found on the INSSE website.

Parameters:
  • filename – the CSV file containing the data. This is either an abosulte path or a path relative to the current folder
  • enforce_warnings – treat warnings as exceptions
get_all_counties(self, prefix=True)

Get all county names from the database

Parameters:prefix – If False, only the county name is returned, otherwise the prefix used for counties in the database is prepended
Returns:a list of all county names in Romania
Return type:list
get_code_by_name(self, name)

Get the entity’s code for the given name

get_county(self, siruta)

Get the entity’s county for the given siruta code as int

Parameters:siruta (int) – The SIRUTA code for which we want the county
Returns:the county code if available, None otherwise
Return type:int
get_county_by_name(self, name)

Get the entity’s county for the given name

get_county_name(self, siruta, prefix=True)

Alias of get_county_string

Parameters:siruta (int) – The SIRUTA code for which we want the county
Return type:string
get_county_string(self, siruta, prefix=True)

Get the entity’s county for the given siruta code as string

Parameters:siruta (int) – The SIRUTA code for which we want the county
Return type:string
get_inf_codes(self, siruta)

Get all the entities that have the given siruta code as superior code

Parameters:siruta (int) – The SIRUTA code for which we want the codes of the inferior entities
Returns:a list of entities that have siruta as their superior cod, None if there are no such entities
Return type:list
get_last_error(self)
get_name(self, siruta, prefix=True)

Get the entity name for the given siruta code

Parameters:
  • siruta (int) – The SIRUTA code for which we want the name
  • prefix (bool) – True if we want the name with entity type, False if we only want the name
Returns:

The name of the entity or None if the code is not in the database

Return type:

string

get_postal_code(self, siruta)

Get the entity’s postal code for the given siruta code

Parameters:siruta (int) – The SIRUTA code for which we want the postal code
Returns:The postal code of the entity, None if the SIRUTA code is not in the database or 0 if the entity has more than one postal code
Return type:string
get_postal_code_by_name(self, name)

Get the entity’s postal code for the given name

get_region(self, siruta)

Get the entity’s region for the given siruta code

Parameters:siruta (int) – The SIRUTA code for which we want the region
Returns:the region code if available, None otherwise
Return type:int
get_region_by_name(self, name)

Get the entity’s region for the given name

get_region_name(self, siruta)

Alias of get_region_string

Parameters:siruta (int) – The SIRUTA code for which we want the region
Returns:the region name if available, None otherwise
Return type:int
get_region_string(self, siruta)

Get the entity’s region for the given code as string

Parameters:siruta (int) – The SIRUTA code for which we want the region
Returns:the region name if available, None otherwise
Return type:int
get_siruta_list(self, county_list=None, type_list=None)

Get a list of SIRUTA codes for entities matching the limitations imposed by both the county and type parameters

Parameters:
  • county_list (list) – List of counties for which we want the codes
  • type_list (list) – List of types for which we want the codes
Returns:

List of codes matching the limitations or an empty list

Return type:

list

get_sup_code(self, siruta)

Get the superior entity code for the given siruta code

Parameters:siruta (int) – The SIRUTA code for which we want the superior’s code
Returns:The code of the superior entity or None if the code is not in the database
Return type:string
get_sup_code_by_name(self, name)

Get the superior entity code for the given name

get_sup_name(self, siruta, prefix=True)

Get the superior entity name for the given siruta code

Parameters:
  • siruta (int) – The SIRUTA code for which we want the name of the superior entity
  • prefix (bool) – True if we want the name with entity type, False if we only want the name
Returns:

The name of the superior entity or None if the code is not in the database

Return type:

string

get_sup_name_by_name(self, name)

Get the superior entity name for the given name

get_type(self, siruta)

Get the entity’s type for the given siruta code

Parameters:siruta (int) – The SIRUTA code for which we want the type
Returns:the entity’s type if available, None otherwise
Return type:int
get_type_by_name(self, name)

Get the entity’s type for the given name

get_type_string(self, siruta)

Get the entity’s type for the given siruta code as string

Parameters:siruta (int) – The SIRUTA code for which we want the type
Returns:the village type description if available, None otherwise
Return type:string
reset_diacritics_params(self)

Reset the parameters for diacritics to the default values (i.e. what we have in the file)

set_diacritics_params(self, cedilla=False, acircumflex=True, nodia=False)

Choose wether to return diacritics with cedilla or comma and with â or î

Parameters:
  • cedilla (bool) – True if we should return diacritics with cedillas, False if we should return diacritics with comma-below
  • acircumflex (bool) – True if we are to return names with Â, False if names with Î are required
  • nodia (bool) – True if diacritics should be stripped, False otherwise
siruta_is_valid(self, siruta)

Utility function which checks if the siruta code is valid according to the algorithm from insse.ro

Parameters:siruta (int) – The SIRUTA code for which we want the name
Returns:True if the code is valid, False otherwise
Return type:bool

testsiruta

Test module for sirutalib

class testsiruta.TestSirutaCsv(methodName='runTest')

Bases: unittest.case.TestCase

assertItemsEqual(self, first, second, msg=None)

An unordered sequence comparison asserting that the same elements, regardless of order. If the same element occurs more than once, it verifies that the elements occur the same number of times.

self.assertEqual(Counter(list(first)),
Counter(list(second)))
Example:
  • [0, 1, 1] and [1, 0, 1] compare equal.
  • [0, 0, 1] and [0, 1] compare unequal.
setUp(self)
test_county_name(self)
test_db_size(self)
test_diacritics_variations(self)
test_get_all_counties(self)
test_get_code_by_name(self)
test_get_county(self)
test_get_county_by_name(self)
test_get_county_string(self)
test_get_inf_codes(self)
test_get_last_error(self)
test_get_name(self)
test_get_postal_code(self)
test_get_postal_code_by_name(self)
test_get_region(self)
test_get_region_by_name(self)
test_get_region_string(self)
test_get_siruta_list(self)
test_get_sup_code(self)
test_get_sup_code_by_name(self)
test_get_sup_name(self)
test_get_sup_name_by_name(self)
test_get_type(self)
test_get_type_by_name(self)
test_get_type_string(self)
test_siruta_is_valid(self)