File Storage ============ MSI provides two storage tiers for different stages of research workflows. For a service-level overview, capacity guidance, and additional policy information, see the MSI `Storage page `__. Choosing the right storage location ----------------------------------- .. list-table:: :header-rows: 1 :widths: 20 30 20 30 * - Location - Best for - How you access it - Important limits * - Home directory - Personal configuration files, source code, small working files, and lightweight software environments. - POSIX filesystem access on MSI systems, Open OnDemand, SSH. - Limited capacity and file count. Use project space rather than ``$HOME`` for shared or fast-growing research data. * - Project space - Active shared research data that a PI group needs to keep on the high-performance filesystem. - POSIX filesystem access on MSI systems, Open OnDemand, SSH. - Intended for active work. Snapshots are available, but groups still need their own retention plan for important data. * - Global scratch - Temporary job data, staging, and short-lived intermediate outputs. - POSIX filesystem access on MSI systems, Open OnDemand, SSH. - Data older than 30 days is deleted automatically. No snapshots or backups. * - Tier 2 - Large-scale shared storage, inactive data, data sharing, and S3-compatible workflows. - ``s3cmd``, ``rclone``, and `Globus `__. - Not a mounted filesystem. No snapshots or MSI-managed backups. .. _tier-1-storage: Tier 1 ------ Tier 1 is MSI's high-performance filesystem for active research data. It includes private home directories, shared project space, and global scratch. For common workflows that move data between MSI, local systems, Google Drive, and external collaborators, see :doc:`Transferring Data To and From MSI `. Default allocations ~~~~~~~~~~~~~~~~~~~ - Home directory: 200 GB and 1 million files per user - Project space: 150 GB and 5 million files per group by default - Global scratch: 40 TB and 10 million files per group Common Tier 1 paths ~~~~~~~~~~~~~~~~~~~ .. code-block:: bash # Private home directory /users/[0-9]/$USER # Project space /projects/standard/PROJECT_NAME /projects/regulated/PROJECT_NAME # Shared and public project directories $SHARED $PUBLIC # Global scratch /scratch.global/$USER $SCRATCH Project directories are organized for shared group access, so MSI does not create private home directories directly under the root of a project. If you need a personal working area inside project space, create a directory inside the project's shared area: .. code-block:: bash mkdir -v $SHARED/$USER This keeps the top-level project directory organized while still giving each user a predictable place for personal working files inside the group's Tier 1 space. Data Insurances ~~~~~~~~~~~~~~~ MSI takes several precautions to reduce the risk of data loss on Tier 1, but those protections are not a substitute for your own backups of irreplaceable data. Snapshots ^^^^^^^^^ Snapshots retain the same file structure and permissions that existed when the snapshot was taken. They are available for: - Home directories under ``/users/*/*`` - Project directories under ``/projects/*/PROJECT_NAME``, including subdirectories and SURFs folders Snapshots are not available for ``/scratch.global`` or Tier 2 storage. Snapshot schedule ^^^^^^^^^^^^^^^^^ - Daily snapshots are typically named like ``snapshot_2026-04-15_05_00_00_UTC`` - Weekly snapshots are typically named like ``snapshot_2026-04-12_00_00_00_UTC`` - Daily snapshots are retained for 6 days - Weekly snapshots are retained for 4 weeks Access and list snapshots ^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: bash # List snapshots for your home directory ls -1 $HOME/.snapshot # Example output snapshot_2026-03-22_00_00_00_UTC snapshot_2026-03-29_00_00_00_UTC snapshot_2026-04-05_00_00_00_UTC snapshot_2026-04-09_05_00_00_UTC snapshot_2026-04-10_05_00_00_UTC snapshot_2026-04-11_05_00_00_UTC snapshot_2026-04-12_00_00_00_UTC snapshot_2026-04-12_05_00_00_UTC snapshot_2026-04-13_05_00_00_UTC snapshot_2026-04-14_05_00_00_UTC snapshot_2026-04-15_05_00_00_UTC # List snapshots for your project's shared directory ls -1 $SHARED/.snapshot # Browse a specific snapshot ls -lah "$SHARED/.snapshot/snapshot_2026-04-12_00_00_00_UTC" Copy data back from a snapshot ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: bash # Restore a file from a home directory snapshot cp -a "$HOME/.snapshot/snapshot_2026-04-14_05_00_00_UTC/example.txt" "$HOME/" # Restore a project file into your personal work directory in shared space cp -a "$SHARED/.snapshot/snapshot_2026-04-12_00_00_00_UTC/results/output.csv" "$SHARED/$USER/" # If you are already in the directory that contains the deleted file, # you can inspect the matching path inside a snapshot first ls -lah .snapshot/snapshot_2026-04-14_05_00_00_UTC You must still have permission to write to the destination where you are restoring data. Disaster recovery ^^^^^^^^^^^^^^^^^ MSI also maintains periodic tape backups for disaster recovery from these Tier 1 locations: - ``/projects/standard/GROUP/public/disaster_recovery`` - ``/projects/standard/GROUP/shared/disaster_recovery`` These tape backups are intended for rare cases where snapshots are not usable, such as a catastrophic storage or data center event. They are not scoped for routine, user-directed file restores, and recovered data may not represent the exact point in time you want. Users should still maintain their own secondary copies of difficult-to-recreate data. As of December 2, 2025, MSI notes that tape backups are not yet available for ``/projects/regulated/GROUP/shared/disaster_recovery``. .. _tier-2-storage: Tier 2 ------ Tier 2 is MSI's object storage platform for large-scale data management, collaboration, and workflows that benefit from S3-compatible access. For service details, see MSI's `Tier 2 storage page `__. Typical Tier 2 workflow ~~~~~~~~~~~~~~~~~~~~~~~ Tier 2 is commonly used as shared lab storage managed at the PI or group level: - A PI can request help creating an initial shared bucket and receives a default Tier 2 allocation of 120 TB - Group members have a default personal quota of 5 GB - The PI or bucket administrator can grant collaborators read and write access to the shared bucket - Lab members can then upload, retrieve, and share data without placing all long-term storage pressure on Tier 1 Access methods ~~~~~~~~~~~~~~ Tier 2 buckets use S3-style object storage rather than a mounted filesystem. Common access methods include: - ``s3cmd`` for command-line access on MSI systems - ``rclone`` for scripting and syncing workflows - `Globus `__ for managed transfers Bucket names are typically referenced like this: .. code-block:: bash s3://BUCKET_NAME Additional capacity and backups ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Tier 2 data is not protected by snapshots or MSI-managed backups. If data stored there is deleted or lost, MSI may not be able to recover it. PIs and research groups should plan their own backup strategy for important Tier 2 data. Groups that need more Tier 2 capacity can request additional storage by reviewing the MSI `Service Catalog `__.