GitHub - seaweedfs/seaweedfs

Overview

SeaweedFS is an open-source, distributed file system designed for high scalability and fast file access. It efficiently stores billions of files with minimal metadata overhead (40 bytes per file) and supports features like replication, erasure coding, cloud integration, and POSIX-compatible directories. Built for simplicity, it offers O(1) disk read operations, making it ideal for small files while also handling large files via chunking. SeaweedFS includes tools like S3-compatible APIs, WebDAV, and Kubernetes CSI support, and is licensed under Apache 2.0.

What is SeaweedFS? 🌊📁

SeaweedFS is a distributed file system designed to store and serve billions of files efficiently while maintaining high performance. It is open-source (Apache 2.0 licensed) and optimized for small files, though it can handle large files via chunking. Unlike traditional file systems, SeaweedFS minimizes metadata overhead and avoids bottlenecks by distributing file metadata across volume servers rather than centralizing it.

Core Features ✨

1. High Scalability & Performance

Stores billions of files with minimal overhead (just 40 bytes per file for metadata).
O(1) disk read operations—files are accessed in a single disk read, making it extremely fast for small files.
Linear scalability—add more volume servers to increase storage capacity without complex rebalancing.

2. Distributed Architecture

Master Server: Manages volume locations (static metadata) and assigns file IDs.
Volume Servers: Store actual file data and manage file metadata (volume ID, offset, size).
Filer (Optional): Adds directory structures and POSIX attributes using external databases (MySQL, PostgreSQL, Redis, etc.).

3. Replication & Data Protection

Configurable replication (e.g., same rack, different data center, or hybrid setups).
Erasure coding for warm data (reduces storage costs while maintaining availability).
Rack-aware and data-center-aware placement for fault tolerance.

4. Cloud & Hybrid Storage Integration

Hot data (frequently accessed) stays on local servers for speed.
Warm data (less frequently accessed) is offloaded to cloud storage (AWS S3, Google Cloud, Azure, etc.) with O(1) access time.
Cost-efficient—minimizes cloud API costs by reducing unnecessary cloud access.

Découvrez Reinforcement Learning environments and how to build them

5. Multiple Access Methods

S3-compatible API (works with AWS CLI, SDKs, and tools like MinIO).
WebDAV (mount as a network drive on Windows/Mac).
Hadoop/Spark/Flink integration (via Hadoop Compatible File System).
FUSE support (mount as a local filesystem).
REST API for direct HTTP uploads/downloads.

6. Enterprise-Grade Features

Automatic failover (no single point of failure).
TTL (Time-to-Live) for files (auto-deletion after expiration).
Encryption (AES-256-GCM) for secure storage.
Compression (automatic based on MIME type).
Active-Active Replication (cross-cluster sync for high availability).

How SeaweedFS Works 🔧

1. File Storage & Retrieval

Uploading a File
- Client requests a file ID (fid) from the master server.
- Master returns a volume ID + server URL.
- Client uploads the file to the assigned volume server via HTTP.
- File metadata (volume ID, offset, size) is stored on the volume server.
Downloading a File
- Client queries the master for the volume server location using the file’s volume ID.
- Client retrieves the file directly from the volume server via HTTP.

2. Volume Management

Each volume is 32GB and can store many small files.
Volumes are statically assigned to files, making lookups O(1) (no complex hashing).
Replication is applied at the volume level (e.g., replicate a volume across 3 servers).

3. Filer (Directory Support)

The Filer is a separate service that adds directory structures.
Uses external databases (MySQL, PostgreSQL, Redis, etc.) to store directory metadata.
Supports POSIX attributes (permissions, timestamps, etc.).

Comparison with Other File Systems 🆚

Découvrez GitHub - kubernetes-sigs/kwok: Kubernetes WithOut Kubelet - Simulates thousands of Nodes and Clusters.

Performance Benchmarks 🚀

Write 1M x 1KB files (16 concurrent connections):
- 15,708 requests/sec (16.2 MB/s).
Read 1M files randomly (16 concurrent connections):
- 47,019 requests/sec (48.5 MB/s).
Mixed workload (GET/PUT/DELETE/STAT):
- 3.3 GB/s throughput (550 objects/sec).

Getting Started 🛠️

1. Quick Setup (Single Node)

# Download the binary
wget https://github.com/seaweedfs/seaweedfs/releases/latest/download/weed_linux_amd64.tar.gz
tar -xvf weed_linux_amd64.tar.gz

# Start a mini cluster (master + volume + filer + S3)
./weed mini -dir=/data

Access UIs:
- Master: http://localhost:9333
- Filer: http://localhost:8888
- S3: http://localhost:8333

2. Production Setup (Multi-Node)

# Start master
./weed master

# Start volume servers
./weed volume -dir=/data1 -max=5 -master=localhost:9333 -port=8080
./weed volume -dir=/data2 -max=10 -master=localhost:9333 -port=8081

# Start filer (optional)
./weed filer -master=localhost:9333

3. Using S3 API

export AWS_ACCESS_KEY_ID=admin
export AWS_SECRET_ACCESS_KEY=key

# Start S3 gateway
./weed server -dir=/data -s3

# Use AWS CLI
aws s3 --endpoint=http://localhost:8333 ls

Use Cases 🎯

✅ Media Storage (images, videos, thumbnails)
✅ Log & Backup Storage (efficient small file handling)
✅ Hybrid Cloud Storage (local + cloud tiering)
✅ Kubernetes Persistent Storage (via CSI driver)
✅ S3-Compatible Object Storage (cheaper alternative to AWS S3)
✅ High-Performance File Serving (CDN-like speed for static assets)

Enterprise Edition 🏢

Self-healing storage format (better data protection).
Priority support & consulting.
Advanced monitoring & management tools.
Visit seaweedfs.com for details.

Community & Support 🤝

GitHub: https://github.com/seaweedfs/seaweedfs
Slack: SeaweedFS Slack
Twitter: @seaweedfs
Documentation: Wiki
Sponsorship: Patreon

Why Choose SeaweedFS? 🏆

✔ Blazing fast (O(1) disk reads, optimized for small files).
✔ Simple & scalable (no complex rebalancing, just add servers).
✔ Cloud-friendly (hot/warm tiering, cost-efficient).
✔ Flexible (S3, WebDAV, FUSE, Hadoop, Kubernetes support).
✔ Open-source & enterprise-ready (Apache 2.0 + paid support).

Cookie	Durée	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Xavki

Open your Sources..

GitHub – seaweedfs/seaweedfs