Data Storage

Quotas and Usage

The quota allocated to a project includes all data stored, including automatically generated replicas (copies or mirrors of the primary data) and the backup requirements. Projects requiring replica of their data to be stored on secondary resource or expect extensive backup needs (due to frequent modification of data) must plan for this extra space and include it in the total request submitted in the application. The awarded quotas must not be exceeded and it is the project reponsible who must ensure that action is taken to reduced any excess usage. Extensive use of replicas implies that the actual space which is available for primary data is significantly less than the awarded quota.

One can at any time check the current usage of primary, replica and total usage, the awarded quotas, number of files etc by using the command dusage. The command can report usage per user or project. Command options are described in further detail by typing

dusage -h

Scratch file systems share the total space that is not used for user of project directories. Under Linux, use the df command to display available scratch space before transferring large data outputs to this file system.

In case project space is used for storing data that is outside the purpose and objectives of the project that were approved by the Resource Allocation Committee, the project will be requested to remove this data as soon as possible or apply for resources that allow such usage.

In case scratch space is used for a purpose that is outside the scope of the NorStore project, this 'foreign' data may be purged without notice.

Compression of data

A project is encouraged to compress project data as much as possible. The following commands can be used for compression (Linux):

  • gzip <file> creates a file <file>.gz
  • bzip2 <file> creates a file <file>.bz2
  • ...

Compression using these commands does not result in loss of information. That is, uncompressing a (compressed) file will return you a file with original contents (but with different time stamp).

bzip2 usually gives better compression, but may use larger resources (compute time and system memory). It does not make sense to compress a file that already has been compressed using (e.g., bzip2).

As file compression may take considerable, convince yourself in advance that file compression actually will lead to a useful reduction of the file size. Certain types of data are already stored in their most compact form.

Compressing data that subsequently will be moved to a tape facility is often not beneficial as the data is usually compressed by the tape facility itself. Please ask the system administrators of (external) tape facilities for more information about the policies for compression.

By default, large files in project space may be compressed by the system administrators in case they have not been touched for at least 90 days, unless there is a clear requirement from the project that this must not happen.

If compression of larger data files are required, a scratch partition on each resource is available for writing larger compressed datasets. Please note that no backup is provided for these scratch areas. The scratch areas are usually wiped automatically nightly.

Backup, Mirror & Purge Policies

User and project space is backed up. Scratch space is not backed up. Users are advised to familiarize themselves with the backup and purge policies in order not to loose valuable data.

Backup of project data (data in project areas) are performed using incremental backups (only modified files) and are made every night. Versions of a file for the last 7 days are available from this backup, while a deleted file remains in backup for a month (30 days).

Backups of user data (home directories) are made every night to tape. Versions are available for 30 days and deleted files remain in backup for half a year (180 days).

Files in the scratch area may be purged once they have not been touched for more than 30 days. Users will normally be notified three working days in advance. In situations where space is urgently needed for projects, this notice may be omitted.

Type Quota Purge Backup
user 5GB none nightly
project predefined none nightly
scratch none 30 days none

The following data are excluded from any backup or replica:

  • contents of (sub)directories named tmp, temp, cache, scratch or junk (in lower and upper case). Use such directory names where possible but with care. Do not put any valuable data in such directories.
  • files named “core” or beginning with "core" (ie "core.*"). Also files with the extension ".o" are excluded (typically temporary files from compilers).
  • contents of (sub)directories with names ending with '_noreplica' (in lower or upper case). For example; a directory named 'testrun_noreplica' will not be mirrored to other sites or mediums and will be the only version stored on the resource (with all the risks of data loss this implies). This option should be applied for data which is not critical, data which is also stored in other data centers or locally, or data which can be regenerated by computations.
  • contents of (sub)directories with names ending with _nobackup (in lower or upper case). This has the same effect as '_noreplica' option.

Each project should consider to what extent there is a need to make backups or replica (mirrors) of data on NorStore. All data (ie. the sum of primary, replica and backup) is charged towards the allocated quota.