Reading notes: System Performance Enterprise & The Cloud — Brendan Gregg
My thought: cloud computing greatly speeds up and reduce the cost for building a startup. Even for larger listed companies, like Dropbox, Netflix, Snapshot, they all rely on cloud provider to save the data center management cost. On the other side, offering IaaS is critical to save cost for large companies. It also opens a door for tons of value adding services on top. A big concern over the cloud is the nature of its virtualization and opaqueness, especially performance monitoring & tuning. This chapter from Brendan, who came from Netflix, detailed his exploration with Cloud at Netflix.
OS Virtualization
OS virtualization has a key difference from hardware virtualization is only 1 kernel is running.
Advantages:
- Little overhead for guest app. Guest app sends syscalls to host kernel directly.
- Memory allocated to the guest comes with 0 extra kernel tax.
- Unified file system cache (no double caching).
- All guests processes are observable from host, allowing performance issues to be debugged.
- CPUs are real CPUs.
Disadvantages:
- Any kernel panic affects all guests.
- Guests cannot run different kernel versions.
Overhead
CPU
CPU overhead while a thread is running in user mode is 0. Activities as listing system state from the kernel may incur extra CPU overhead as other tenant statistics are filtered, but this is not frequent (40 us/1k process entries).
I/O
IO overhead is 0.
Other Tenants
CPU caches may have a lower hit ratio.
CPU execution may be interrupted for short periods for other tenants devices performing interrupt service routines.
Resource contentions.
Resource Controls
CPU
- Limit on guest CPU usage (no elasticity) to provide a consistent performance expectation.
- With elasticity, idle resources could be shared across tenants but would experience a drop in performance when another CPU hungry tenant arrives.
Memory
- Main memory.
- Virtual memory.
File System
- I/O throttling (ZFS), implemented via injecting delays on the completion of the I/O before returning to user-space.
Hardware Virtualization
Full Virtualization — binary translation
Uses a mixture of direct processor execution and binary translations of instructions when needed.
Full virtualization — hardware-assisted
Provides a complete virtual system composed of virtualized hardware components onto which an unmodified OS can be installed.
Paravirtualization
Provides a virtual system that includes an interface for guest OS to efficiently use host resources (via hypercalls).
Virtual machines are created and executed by a hypervisor. There are two types of hypervisors.
- Type 1 executes directly on the processors and not as kernel- or user level software of another host. Hypervisor admin may be performed by a privileged guest. This type is also called native hypervisor or bare-metal hypervisor.
Guest OS #0 | Guest OS | Guest OS
Host Admin | G Kernel | G Kernel
----------------------------------
Hypervisor (Scheduler)
----------------------------------
Hardware (Processors)
- Type 2 is executed by the host OS kernel and may be composed of kernel level modules and user-level processes. The host OS has privileges to admin hypervisor and launch new guests.
Guest OS #0 | Guest OS | Guest OS
Host Admin | G Kernel | G Kernel
----------------------------------
Hypervisor
----------------------------------
Host Kernel (Scheduler)
----------------------------------
Host OS
----------------------------------
Hardware (Processors)
Overhead
CPU
Overheads may be encountered when making privileged processor calls, accessing hardware and mapping main memory.
- Binary translation: guest kernel instructions that operate on physical resources are identified and translated.
- Paravirtualization: instructions in the guest OS that must be virtualized are replaced with hypercalls to the hypervisor.
- Hardware-assisted:unmodified guest kernel instructions that operate on hardware are handled by the hypervisor, which runs a VMM at a ring below 0.
Hardware assisted virtualization is generally preferred.
The rate of transitions between the guest and hypervisor, as well as the time spent in the hypervisor can be studied as a metrics of CPU overhead.
Memory Mapping
For virtualization, mapping a new page of memory from the guest to hardware involves two steps:
- virtual-to-guest physical translation, as performed by guest kernel.
- guest-physical to host-physical translation, performed by the hypervisor VMM.
2nd step can be cached in TLB. Plus, modern processors support MMU virtualization so that mappings that have left the TLB could be recalled more quickly in hardware alone, w/o calling into hypervisor. EPT intel and NPT AMD.
Memory Size
Double caching, a small overhead for file system cache.
I/O
A KEY cost of virtualization is the overhead for performing device I/O. For CPU/memory IO, common path can be set up to execute in a bare-metal fashion. Device IO must be translated by the hypervisor. IO overhead could be mitigated by paravirtualization or PCI pass through.