Brick : A Brick is the basic unit of storage in GlusterFS, represented by an export directory on a server in the trusted storage pool. A brick is expressed by combining a server with an export directory in the following format:
`SERVER:EXPORT` For example: `myhostname:/exports/myexportdir/`
Volume : A volume is a logical collection of bricks. Most of the gluster management operations happen on the volume.
Subvolume : A brick after being processed by at least one translator or in other words set of one or more xlator stacked together is called a sub-volume.
Volfile : Volume (vol) files are configuration files that determine the behavior of the GlusterFs trusted storage pool. Volume file is a textual representation of a collection of modules (also known as translators) that together implement the various functions required. The collection of modules are arranged in a graph-like fashion. E.g, A replicated volume's volfile, among other things, would have a section describing the replication translator and its tunables. This section describes how the volume would replicate data written to it. Further, a client process that serves a mount point, would interpret its volfile and load the translators described in it. While serving I/O, it would pass the request to the collection of modules in the order specified in the volfile.
At a high level, GlusterFs has three entities,that is, Server, Client and Management daemon. Each of these entities have their own volume files. Volume files for servers and clients are generated by the management daemon after the volume is created. Server and Client Vol files are located in /var/lib/glusterd/vols/VOLNAME directory. The management daemon vol file is named as glusterd.vol and is located in /etc/glusterfs/ directory.
glusterd : The daemon/service that manages volumes and cluster membership. It is required to run on all the servers in the trusted storage pool.
Cluster : A trusted pool of linked computers working together, resembling a single computing resource. In GlusterFs, a cluster is also referred to as a trusted storage pool.
Client : Any machine that mounts a GlusterFS volume. Any applications that use libgfapi access mechanism can also be treated as clients in GlusterFS context.
Server : The machine (virtual or bare metal) that hosts the bricks in which data is stored.
Block Storage : Block special files, or block devices, correspond to devices through which the system moves data in the form of blocks. These device nodes often represent addressable devices such as hard disks, CD-ROM drives, or memory regions. GlusterFS requires a filesystem (like XFS) that supports extended attributes.
Filesystem : A method of storing and organizing computer files and their data. Essentially, it organizes these files into a database for the storage, organization, manipulation, and retrieval by the computer's operating system.
Distributed File System : A file system that allows multiple clients to concurrently access data which is spread across servers/bricks in a trusted storage pool. Data sharing among multiple locations is fundamental to all distributed file systems.
Virtual File System (VFS) : VFS is a kernel software layer which handles all system calls related to the standard Linux file system. It provides a common interface to several kinds of file systems.
POSIX : Portable Operating System Interface (for Unix) is the name of a family of related standards specified by the IEEE to define the application programming interface (API), along with shell and utilities interfaces for software compatible with variants of the Unix operating system. Gluster exports a fully POSIX compliant file system.
Extended Attributes : Extended file attributes (abbreviated xattr) is a filesystem feature that enables users/programs to associate files/dirs with metadata.
FUSE : Filesystem in Userspace (FUSE) is a loadable kernel module for Unix-like computer operating systems that lets non-privileged users create their own filesystems without editing kernel code. This is achieved by running filesystem code in user space while the FUSE module provides only a "bridge" to the actual kernel interfaces.
GFID : Each file/directory on a GlusterFS volume has a unique 128-bit number associated with it called the GFID. This is analogous to inode in a regular filesystem.
Infiniband InfiniBand is a switched fabric computer network communications link used in high-performance computing and enterprise data centers.
Metadata : Metadata is data providing information about one or more other pieces of data.
Namespace : Namespace is an abstract container or environment created to hold a logical grouping of unique identifiers or symbols. Each Gluster volume exposes a single namespace as a POSIX mount point that contains every file in the cluster.
Node : A server or computer that hosts one or more bricks.
Open Source : Open source describes practices in production and development that promote access to the end product's source materials. Some consider open source a philosophy, others consider it a pragmatic methodology.
Before the term open source became widely adopted, developers and producers used a variety of phrases to describe the concept; open source gained hold with the rise of the Internet, and the attendant need for massive retooling of the computing source code. Opening the source code enabled a self-enhancing diversity of production models, communication paths, and interactive communities. Subsequently, a new, three-word phrase "open source software" was born to describe the environment that the new copyright, licensing, domain, and consumer issues created. Source: [Wikipedia]
Petabyte : A petabyte (derived from the SI prefix peta- ) is a unit of information equal to one quadrillion (short scale) bytes, or 1000 terabytes. The unit symbol for the petabyte is PB. The prefix peta- (P) indicates a power of 1000:
1 PB = 1,000,000,000,000,000 B = 10005 B = 1015 B. The term "pebibyte" (PiB), using a binary prefix, is used for the corresponding power of 1024. Source: [Wikipedia]
Quorum : The configuration of quorum in a trusted storage pool determines the number of server failures that the trusted storage pool can sustain. If an additional failure occurs, the trusted storage pool becomes unavailable.
Quota : Quota allows you to set limits on usage of disk space by directories or by volumes.
RAID : Redundant Array of Inexpensive Disks (RAID) is a technology that provides increased storage reliability through redundancy, combining multiple low-cost, less-reliable disk drives components into a logical unit where all drives in the array are interdependent.
RDMA : Remote direct memory access (RDMA) is a direct memory access from the memory of one computer into that of another without involving either one's operating system. This permits high-throughput, low-latency networking, which is especially useful in massively parallel computer clusters.
Rebalance : A process of fixing layout and resdistributing data in a volume when a brick is added or removed.
RRDNS : Round Robin Domain Name Service (RRDNS) is a method to distribute load across application servers. RRDNS is implemented by creating multiple A records with the same name and different IP addresses in the zone file of a DNS server.
Samba : Samba allows file and print sharing between computers running Windows and computers running Linux. It is an implementation of several services and protocols including SMB and CIFS.
Self-Heal : The self-heal daemon that runs in the background, identifies inconsistencies in files/dirs in a replicated volume and then resolves or heals them. This healing process is usually required when one or more bricks of a volume goes down and then comes up later.
Split-brain : This is a situation where data on two or more bricks in a replicated volume start to diverge in terms of content or metadata. In this state, one cannot determine programitically which set of data is "right" and which is "wrong".
Translator : Translators (also called xlators) are stackable modules where each module has a very specific purpose. Translators are stacked in a hierarchical structure called as graph. A translator receives data from its parent translator, performs necessary operations and then passes the data down to its child translator in hierarchy.
Trusted Storage Pool : A storage pool is a trusted network of storage servers. When you start the first server, the storage pool consists of that server alone.
Scale-Up Storage : Increases the capacity of the storage device in a single dimension. For example, adding additional disk capacity to an existing trusted storage pool.
Scale-Out Storage : Scale out systems are designed to scale on both capacity and performance. It increases the capability of a storage device in single dimension. For example, adding more systems of the same size, or adding servers to a trusted storage pool that increases CPU, disk capacity, and throughput for the trusted storage pool.
Userspace : Applications running in user space don’t directly interact with hardware, instead using the kernel to moderate access. Userspace applications are generally more portable than applications in kernel space. Gluster is a user space application.
Geo-Replication : Geo-replication provides a continuous, asynchronous, and incremental replication service from site to another over Local Area Networks (LAN), Wide Area Network (WAN), and across the Internet.
N-way Replication : Local synchronous data replication which is typically deployed across campus or Amazon Web Services Availability Zones.
Distributed Hash Table Terminology Hashed subvolume : A Distributed Hash Table Translator subvolume to which the file or directory name is hashed to.
Cached subvolume : A Distributed Hash Table Translator subvolume where the file content is actually present. For directories, the concept of cached-subvolume is not relevant. It is loosely used to mean subvolumes which are not hashed-subvolume.
Linkto-file : For a newly created file, the hashed and cached subvolumes are the same. When directory entry operations like rename (which can change the name and hence hashed subvolume of the file) are performed on the file, instead of moving the entire data in the file to a new hashed subvolume, a file is created with the same name on the newly hashed subvolume. The purpose of this file is only to act as a pointer to the node where the data is present. In the extended attributes of this file, the name of the cached subvolume is stored. This file on the newly hashed-subvolume is called a linkto-file. The linkto file is relevant only for non-directory entities.
Directory Layout : The directory layout specifies the hash-ranges of the subdirectories of a directory to which subvolumes they correspond to.
Properties of directory layouts: : The layouts are created at the time of directory creation and are persisted as extended attributes of the directory. A subvolume is not included in the layout if it remained offline at the time of directory creation and no directory entries ( such as files and directories) of that directory are created on that subvolume. The subvolume is not part of the layout until the fix-layout is complete as part of running the rebalance command. If a subvolume is down during access (after directory creation), access to any files that hash to that subvolume fails.
Fix Layout : A command that is executed during the rebalance process. The rebalance process itself comprises of two stages: Fixes the layouts of directories to accommodate any subvolumes that are added or removed. It also heals the directories, checks whether the layout is non-contiguous, and persists the layout in extended attributes, if needed. It also ensures that the directories have the same attributes across all the subvolumes.
Migrates the data from the cached-subvolume to the hashed-subvolume.