IT-Analysis.com
IT-Analysis.com Logo
Technology Big Data
Business Issues Channels Enterprise Services SME Technology
Module Header
Craig WentworthMWD Advisors
Craig Wentworth
16th April - Egnyte the blue touchpaper...
Louella FernandesLouella Fernandes
Louella Fernandes
11th April - Managed Print Services: Are SMBs Ready?
Louella FernandesLouella Fernandes
Louella Fernandes
11th April - The Managed Print Services (MPS) Opportunity for SMBs
Simon HollowayThe Holloway Angle
Simon Holloway
11th April - Intellinote - capture anything!
David NorfolkThe Norfolk Punt
David Norfolk
11th April - On the road to Morocco

Analysis

Big Data and In-Memory Database
[No Image] By: Joe Clabby, President, Clabby Analytics
Published: 18th March 2013
Copyright Clabby Analytics © 2013

For some unknown reason the topic of memory has come up a lot in my research this week. It started when I was comparing cache designs on IBM's POWER7+ microprocessor to Intel's i7 x86 architecture, then moved into how much main memory each system could support—and then I chose to add an IBM System z mainframe to the discussion.

As I looked at each processor environment, here's what I found:

  • The System z has a tremendous amount of on-chip cache. 969 kB Level 1. 12 MB Level 2. 48 MB Level 3. And 348 MB Level 4.
  • The POWER7+ has 512kB Level 1. 2 MB Level 2. And 80 MB Level 3.
  • The Intel E7 core that I chose because it had a high core count and average i7 speed was an E7 8870. This had 480 kB Level 1. 2 MB Level 2. And 30 MB Level 3. 

Why is this important? Because the closer you can put data in-memory to the processor, the faster that data can be processed.

I then started to look at main memory for each system. And here's what I found.

  • The System z can address up to 3TB of main memory.
  • The POWER7+ can address up to 4 TB of main memory (in a Power 770 configuration).
  • The E7 chip specifications say that it can address up to 512 GB. Last I looked, Hewlett-Packard’s BladeSystem topped out at 576Gb; Cisco’s UCS B230 M2 topped out at 512Gb; and Dell’s blade environment had the ability to address 640Gb (but, as I recollect, at 3 memory DIMMS per channel, using all 640 Gb of memory may result in unbalanced performance). Still, I have run across some vendor non-blade configurations with around 1.5 TB of memory. And I think IBM's MAX 5 architecture can take this up to 1 TB.

Why is this important? Again, because the more data that you can place near the processor, the faster that data can be processed.

I then started to think about IBM's Flex System architecture. This environment can run POWER and/or x86 chips (note: you can process twice as many threads with POWER chips—and POWER has significantly more on-chip cache). This environment has access to plenty main memory. This environment has eight internal, on-the-compute-node solid state drives that can also act as extended memory and that can accelerate the processing of applications that benefit from high IOPS (input/output per second) performance. Applications that perform extremely well within a Flex System environment include various data mining and database applications, multimedia streaming and video-on-demand, a wealth of financial services applications (that rely on results for quick decision making), surveillance and security applications (especially for real time security checks against reference materials), and
video rendering. I then asked myself—are Big Data applications appropriate for this environment?

This meant I had to venture away from memory into storage (storage feeds memory). IBM's Pure Systems/Flex System architecture offers access to large amounts of internal storage (blades typically do not). IBM’s StorWise V7000 storage array can be mounted within a Flex System environment—and can thus speed access to data (no need for multiple hops). Additionally, PureSystems/Flex Systems offer direct access by compute nodes to up to eight SSDs located within each compute node. These SSDs act like extended, fast memory—and are also positioned to provide 'hot data' rapidly to compute elements.

I then started thinking outside the box (literally—about external storage subsystems). IBM's storage offerings are particularly strong in the areas of tiering (placing the data used most often on fast disk for fast accesss), in compression, and in interoperability. But it is the tiering that interest me most because, yet again, it places hot data closer to the processor. And the closer that data is to the processor, the faster it can be processed...

What I think we're going to see soon is systems designed around in-memory database processing. Traditional blade architecture is not positioned to support very large memory (VLM) databases due to memory/footprint constraints. But other architectures such as traditional mainframes, Power Systems, and scale-up x86 designs are indeed well positioned for in-memory database processing. 

Next week I'm starting a research report on systems designs and will discuss this topic in greater depth. But I would welcome any feedback and thoughts from readers of this article in the meantime. Please consider dropping me an e-mail or commenting on this article.

Big data boosts from in-memory databases and analytics—Why does in-memory technology help with big data processing problems? What is the role of data compression? Looking beyond in-memory storage, what about optimized hybrid storage?

Advertisement



Published by: IT Analysis Communications Ltd.
T: +44 (0)190 888 0760 | F: +44 (0)190 888 0761
Email: