giovedì 22 dicembre 2016

FileServer o WebServer, questo è il dilemma...

Recentemente ho eseguito un interessante progetto in merito alla fattibilità ed analisi dei costi per la migrazione di un vetusto, se pur ancora mediocremente funzionante, FileServer.
Avete presente i buoni vecchi tower pieni di dischi dimenticati “da qualche parte in qualche sgabuzzino”?

Ecco, proprio uno di quelli.

L'as-is si compone di una sede principale e tre branch office, dislocati nel territorio emiliano, magliati in MPLS ed un ottima connettività internet. Nella sede principale avevamo il nostro bel serverone che condivideva cartelle e file per gli utenti di dominio della sede principale e di tutte le sedi remote.

Oltre alla necessità di rinnovare l' HW, il cliente lamentava grossi problemi, da parte degli utenti delle sedi remote, a fruire dei contenuti/files presenti sul FileServer della sede principale.

Fatti subito due conti, il costo dell' eventuale aumento della banda MPLS non poteva essere preso in considerazione per motivi di budget mentre un eventuale, se necessario, upgrade della banda internet era cosa da poco ( paragonando i due costi…). 
Ulteriore dettaglio non trascurabile era il fatto che il cliente non voleva acquistare HW ma “metterlo in CLOUD” ( quante volte ve lo siete sentiti dire? ) e voleva spendere nulla o quasi ( vedesi l’ormai famoso connubio tutto-subito-costapoco )

Da dove sono partito..
Poichè il System Integrator per cui lavoro eroga da tempo servizi a valore, ho pensato di prendere come base uno in particolare, il, adeguato opportunamente al fine di giungere allo scopo finale.

Di cosa stiamo parlando?

Molto “semplicemente” di creare una piattaforma LAMP, nel mio specifico caso ho scelto Ubuntu 16.04LTS, Apache2.4, MySQL e Php7, di implementare, customizzandolo, il set di istruzioni WebDAV ("Web-based Distributed Authoring and Versioning" ) e renderlo fruibile e sicuro.

La scelta, in questo particolare caso, è ricaduta su WebDAV poiché, tra le sue caratteristiche, ha quella di essere gratuito, permette la creazione/eliminazione, la modifica e lo spostamento di file situati in un webserver, include un sistema di blocco (ovvero protezione dalla sovrascrittura dei file), di proprietà (creazione, rimozione e richieste di informazioni riguardo all'autore, alla data di modifica, ecc..) ed è accedibile in maniera semplice, sicura ed affidabile.

L’ambiente viene poi connesso in VPN all’ AD del cliente tale da garantire l’accesso al fileserver in SSO
Gli utenti, siano essi interni che in mobilità, possono crearsi un percorso di rete personalizzato puntando al path del servizio WebDAV. 
Esistono sul mercato vari client oppure, come in questo caso, l'OS dei PC del cliente aveva già integrato tale protocollo ( click destro, aggiungi percorso di rete personalizzato,  https://webserver/condivisione )

In LAB è stato realizzato in davvero poco tempo e devo dire che ha soddisfatto in pieno le richieste del cliente.

Semplice, veloce ed efficace…

E il costo?
Qualche ora di sviluppo/customizzazione, il pay-as-you-grow del SaaS e il relativo servizio di assistenza.

venerdì 4 marzo 2016

VMware - Does corespersocket Affect Performance?

There is a lot of outdated information regarding the use of a vSphere feature that changes the presentation of logical processors for a virtual machine, into a specific socket and core configuration. This advanced setting is commonly known as corespersocket.
It was originally intended to address licensing issues where some operating systems had limitations on the number of sockets that could be used, but did not limit core count.
KB Reference:
It’s often been said that this change of processor presentation does not affect performance, but it may impact performance by influencing the sizing and presentation of virtual NUMA to the guest operating system.
Reference Performance Best Practices for VMware vSphere 5.5 (page 44):

Recommended Practices

#1 When creating a virtual machine, by default, vSphere will create as many virtual sockets as you’ve requested vCPUs and the cores per socket is equal to one. I think of this configuration as “wide” and “flat.” This will enable vNUMA to select and present the best virtual NUMA topology to the guest operating system, which will be optimal on the underlying physical topology.
#2 When you must change the cores per socket though, commonly due to licensing constraints, ensure you mirror physical server’s NUMA topology. This is because when a virtual machine is no longer configured by default as “wide” and “flat,” vNUMA will not automatically pick the best NUMA configuration based on the physical server, but will instead honor your configuration – right or wrong – potentially leading to a topology mismatch that does affect performance.
To demonstrate this, the following experiment was performed. Special thanks to Seongbeom for this test and the results.

Test Bed

Dell R815 AMD Opteron 6174 based server with 4x physical sockets by 12x cores per processor = 48x logical processors.

The AMD Opteron 6174 (aka Magny-Cours) processor is essentially two 6 core Istanbul processors assembled into a single socket. This architecture means that each physical socket is actually two NUMA nodes. So this server actually has 8x NUMA nodes and not four, as some may incorrectly assume.
Within esxtop, we can validate the total number of physical NUMA nodes that vSphere detects.

Test VM Configuration #1 – 24 sockets by 1 core per socket (“Wide” and “Flat”)

Since this virtual machine requires 24 logical processors, vNUMA automatically creates the smallest topology to support this requirement being 24 cores, which means 2 physical sockets, and therefore a total of 4 physical NUMA nodes.
Within the Linux based virtual machine used for our testing, we can validate what vNUMA presented to the guest operating system by using: numactl –hardware

Next, we ran an in-house micro-benchmark, which exercises processors and memory. For this configuration we see a total execution time of 45 seconds.

Next let’s alter the virtual sockets and cores per socket of this virtual machine to generate another result for comparison.
Test VM Configuration #2 – 2 sockets by 12 cores per socket

In this configuration, while the virtual machine is still configured have a total of 24 logical processors, we manually intervened and configured 2 virtual sockets by 12 cores per socket. vNUMA will no longer automatically create the topology it thinks is best, but instead will respect this specific configuration and present only two virtual NUMA nodes as defined by our virtual socket count.
Within the Linux based virtual machine, we can validate what vNUMA presented to the guest operating system by using: numactl –hardware

Re-running the exact same micro-benchmark we get an execution time of 54 seconds.

This configuration, which resulted in a non-optimal virtual NUMA topology, incurred a 17% increase in execution time.
Test VM Configuration #3 – 1 socket by 24 cores per socket

In this configuration, while the virtual machine is again still configured have a total of 24 logical processors, we manually intervene and configured 1 virtual socket by 24 cores per socket. Again, vNUMA will no longer automatically create the topology it thinks is best, but instead will respect this specific configuration and present only one NUMA node as defined by our virtual socket count.
Within the Linux based virtual machine, we can validate what vNUMA presented to the guest operating system by using: numactl –hardware

Re-running the micro-benchmark one more time we get an execution time of 65 seconds.

This configuration, with yet a different non-optimal virtual NUMA topology, incurred a 31% increase in execution time.
To summarize, this test demonstrates that changing the corespersocket configuration of a virtual machine does indeed have an impact on performance in the case when the manually configured virtual NUMA topology does not optimally match the physical NUMA topology.

The Takeaway

Always spend a few minutes to understand your physical servers NUMA topology and leverage that when rightsizing your virtual machines.
Other Great References:
The CPU Scheduler in VMware vSphere 5.1
Check out VSCS4811 Extreme Performance Series: Monster Virtual Machines in VMworld Barcelona
Node Interleaving: Enable or Disable?

There seems to be a lot of confusion about this BIOS setting, I receive lots of questions on whether to enable or disable Node interleaving. I guess the term “enable” make people think it some sort of performance enhancement. Unfortunately the opposite is true and it is strongly recommended to keep the default setting and leave Node Interleaving disabled.
Node interleaving option only on NUMA architectures
The node interleaving option exists on servers with a non-uniform memory access (NUMA) system architecture. The Intel Nehalem and AMD Opteron are both NUMA architectures. In a NUMA architecture multiple nodes exists. Each node contains a CPU and memory and is connected via a NUMA interconnect. A pCPU will use its onboard memory controller to access its own “local” memory and connects to the remaining “remote” memory via an interconnect. As a result of the different locations memory can exists, this system experiences “non-uniform” memory access time.

Node interleaving disabled equals NUMA
By using the default setting of Node Interleaving (disabled), the system will build a System Resource Allocation Table (SRAT). ESX uses the SRAT to understand which memory bank is local to a pCPU and tries* to allocate local memory to each vCPU of the virtual machine. By using local memory, the CPU can use its own memory controller and does not have to compete for access to the shared interconnect (bandwidth) and reduce the amount of hops to access memory (latency)

* If the local memory is full, ESX will resort in storing memory on remote memory because this will always be faster than swapping it out to disk.
Node interleaving enabled equals UMA
If Node interleaving is enabled, no SRAT will be built by the system and ESX will be unaware of the underlying physical architecture.

ESX will treat the server as a uniform memory access (UMA) system and perceives the available memory as one contiguous area. Introducing the possibility of storing memory pages in remote memory, forcing the pCPU to transfer data over the NUMA interconnect each time the virtual machine wants to access memory.
By leaving the setting Node Interleaving to disabled, ESX can use System Resource Allocation Table to the select the most optimal placement of memory pages for the virtual machines. Therefore it’s recommended to leave this setting to disabled even when it does sound that you are preventing the system to run more optimally.
