GitHub – seaweedfs/seaweedfs

Overview

SeaweedFS is an open-source, distributed file system designed for high scalability and fast file access. It efficiently stores billions of files with minimal metadata overhead (40 bytes per file) and supports features like replication, erasure coding, cloud integration, and POSIX-compatible directories. Built for simplicity, it offers O(1) disk read operations, making it ideal for small files while also handling large files via chunking. SeaweedFS includes tools like S3-compatible APIs, WebDAV, and Kubernetes CSI support, and is licensed under Apache 2.0.

What is SeaweedFS? 🌊📁

SeaweedFS is a distributed file system designed to store and serve billions of files efficiently while maintaining high performance. It is open-source (Apache 2.0 licensed) and optimized for small files, though it can handle large files via chunking. Unlike traditional file systems, SeaweedFS minimizes metadata overhead and avoids bottlenecks by distributing file metadata across volume servers rather than centralizing it.


Core Features ✨

1. High Scalability & Performance

  • Stores billions of files with minimal overhead (just 40 bytes per file for metadata).
  • O(1) disk read operations—files are accessed in a single disk read, making it extremely fast for small files.
  • Linear scalability—add more volume servers to increase storage capacity without complex rebalancing.

2. Distributed Architecture

  • Master Server: Manages volume locations (static metadata) and assigns file IDs.
  • Volume Servers: Store actual file data and manage file metadata (volume ID, offset, size).
  • Filer (Optional): Adds directory structures and POSIX attributes using external databases (MySQL, PostgreSQL, Redis, etc.).

3. Replication & Data Protection

  • Configurable replication (e.g., same rack, different data center, or hybrid setups).
  • Erasure coding for warm data (reduces storage costs while maintaining availability).
  • Rack-aware and data-center-aware placement for fault tolerance.

4. Cloud & Hybrid Storage Integration

  • Hot data (frequently accessed) stays on local servers for speed.
  • Warm data (less frequently accessed) is offloaded to cloud storage (AWS S3, Google Cloud, Azure, etc.) with O(1) access time.
  • Cost-efficient—minimizes cloud API costs by reducing unnecessary cloud access.
Découvrez  Reinforcement Learning environments and how to build them

5. Multiple Access Methods

  • S3-compatible API (works with AWS CLI, SDKs, and tools like MinIO).
  • WebDAV (mount as a network drive on Windows/Mac).
  • Hadoop/Spark/Flink integration (via Hadoop Compatible File System).
  • FUSE support (mount as a local filesystem).
  • REST API for direct HTTP uploads/downloads.

6. Enterprise-Grade Features

  • Automatic failover (no single point of failure).
  • TTL (Time-to-Live) for files (auto-deletion after expiration).
  • Encryption (AES-256-GCM) for secure storage.
  • Compression (automatic based on MIME type).
  • Active-Active Replication (cross-cluster sync for high availability).

How SeaweedFS Works 🔧

1. File Storage & Retrieval

  • Uploading a File

    • Client requests a file ID (fid) from the master server.
    • Master returns a volume ID + server URL.
    • Client uploads the file to the assigned volume server via HTTP.
    • File metadata (volume ID, offset, size) is stored on the volume server.
  • Downloading a File

    • Client queries the master for the volume server location using the file’s volume ID.
    • Client retrieves the file directly from the volume server via HTTP.

2. Volume Management

  • Each volume is 32GB and can store many small files.
  • Volumes are statically assigned to files, making lookups O(1) (no complex hashing).
  • Replication is applied at the volume level (e.g., replicate a volume across 3 servers).

3. Filer (Directory Support)

  • The Filer is a separate service that adds directory structures.
  • Uses external databases (MySQL, PostgreSQL, Redis, etc.) to store directory metadata.
  • Supports POSIX attributes (permissions, timestamps, etc.).

Comparison with Other File Systems 🆚

| Feature | SeaweedFS | HDFS | GlusterFS | Ceph | MinIO |
|———————–|———————————–|——————————-|——————————-|——————————-|——————————-|
| Optimized For | Small files, high concurrency | Large files, batch processing | General-purpose | General-purpose | S3-compatible object storage |
| Metadata Overhead | 40 bytes per file | High (centralized namenode) | Moderate | High (CRUSH algorithm) | Moderate |
| Scalability | Linear, no rebalancing | Limited by namenode | Limited by hashing | Complex (CRUSH) | Limited by sharding |
| Cloud Integration | Yes (hot/warm tiering) | Limited | No | Yes | Yes (native) |
| POSIX Support | Yes (via Filer) | Yes | Yes | Yes | No |
| Erasure Coding | Yes (warm data only) | No | No | Yes | Yes (always on) |

Découvrez  GitHub - kubernetes-sigs/kwok: Kubernetes WithOut Kubelet - Simulates thousands of Nodes and Clusters.

Performance Benchmarks 🚀

  • Write 1M x 1KB files (16 concurrent connections):
    • 15,708 requests/sec (16.2 MB/s).
  • Read 1M files randomly (16 concurrent connections):
    • 47,019 requests/sec (48.5 MB/s).
  • Mixed workload (GET/PUT/DELETE/STAT):
    • 3.3 GB/s throughput (550 objects/sec).

Getting Started 🛠️

1. Quick Setup (Single Node)

# Download the binary
wget https://github.com/seaweedfs/seaweedfs/releases/latest/download/weed_linux_amd64.tar.gz
tar -xvf weed_linux_amd64.tar.gz

# Start a mini cluster (master + volume + filer + S3)
./weed mini -dir=/data
  • Access UIs:
    • Master: http://localhost:9333
    • Filer: http://localhost:8888
    • S3: http://localhost:8333

2. Production Setup (Multi-Node)

# Start master
./weed master

# Start volume servers
./weed volume -dir=/data1 -max=5 -master=localhost:9333 -port=8080
./weed volume -dir=/data2 -max=10 -master=localhost:9333 -port=8081

# Start filer (optional)
./weed filer -master=localhost:9333

3. Using S3 API

export AWS_ACCESS_KEY_ID=admin
export AWS_SECRET_ACCESS_KEY=key

# Start S3 gateway
./weed server -dir=/data -s3

# Use AWS CLI
aws s3 --endpoint=http://localhost:8333 ls

Use Cases 🎯

Media Storage (images, videos, thumbnails)
Log & Backup Storage (efficient small file handling)
Hybrid Cloud Storage (local + cloud tiering)
Kubernetes Persistent Storage (via CSI driver)
S3-Compatible Object Storage (cheaper alternative to AWS S3)
High-Performance File Serving (CDN-like speed for static assets)


Enterprise Edition 🏢

  • Self-healing storage format (better data protection).
  • Priority support & consulting.
  • Advanced monitoring & management tools.
  • Visit seaweedfs.com for details.

Community & Support 🤝


Why Choose SeaweedFS? 🏆

Blazing fast (O(1) disk reads, optimized for small files).
Simple & scalable (no complex rebalancing, just add servers).
Cloud-friendly (hot/warm tiering, cost-efficient).
Flexible (S3, WebDAV, FUSE, Hadoop, Kubernetes support).
Open-source & enterprise-ready (Apache 2.0 + paid support).

Extra links