In previous posts we’ve talked about machine-generated data, and how Splunk allows you to slice, dice and analyze that data. With Splunk you can collect data from all kinds of sources — security systems, sensors, enterprise applications and more — regardless of format or location. Splunk then applies powerful indexing, search, analysis and visualization capabilities to provide real-time operational intelligence and sophisticated security information and event management.
Understanding Splunk Buckets
Because indexing requires significant resources, Splunk divides data into five tiers or “buckets”:
- Hot — the most current, active data that is being searched
- Warm — read-only data that is still begin searched
- Cold — archived read-only data that is rarely searched
- Frozen — data that is pushed off active storage or deleted
- Thawed — frozen data that has been restored and does not age
Splunk can automatically move data from one tier to the next based upon user-defined policies and available storage capacity. Because Splunk is aggregating machine-generated data from multiple sources, data storage volumes can quickly grow to many terabytes or even petabytes and billions of files.
Common Challenges with Splunk
As a result, Splunk requires a high-performance, high-capacity, tiered storage infrastructure, often in a clustered deployment. High-performance storage maximizes end-user productivity by enabling fast searches and queries. Advanced features such as inline de-duplication, compression, snapshots and cloning help keep data volumes manageable.
However, customers often adopt Splunk on a limited scale, expecting to expand the deployment as the solution shows value. In this scenario, Splunk may be installed on commodity server hardware with direct-attached storage (DAS). This provides a good starting point for small Splunk environments, but DAS becomes difficult to manage as the environment grows. Organizations soon find that they are unable to ensure fast querying and search of hot and warm data, and that they need greater capacity for cold and frozen data. Over time, the complexity and inefficiency of the DAS model can significantly increase total cost of ownership.
The Solutions are Easy
A hybrid storage environment of all-flash arrays and scale-out, network-attached storage (NAS) addresses these challenges. All-flash arrays deliver higher performance and lower latency without complex tuning or setup. Enterprise-class flash arrays also perform de-duplication and compression without performance penalties and provide advanced cloning services without consuming additional capacity. In addition, all-flash arrays enable organizations to decouple compute and storage so that each can be scaled independently, and provide more efficient centralized management and improved data protection.
A scale-out NAS cluster creates a unified pool of highly efficient storage that can be expanded automatically to accommodate growing volumes of cold and frozen data. Simplified management reduces storage administration costs, and there is no need to over-provision storage to meet performance and capacity requirements.
Getting Started with Splunk
Organizations looking to gain maximum value from Splunk need a storage architecture that minimizes search times, enables storage tiering and keeps data volumes in check.
As a Splunk Partner, our engineers can help take your Splunk deployment to the next level by architecting a robust storage infrastructure to meet your goals for performance, scalability and cost.
September 28, 2015
Comments