The trend of high core density hardware also guides the evolution of software development, introducing new types of parallel programming models. Multi-threaded applications developed under these models must be able to leverage parallel execution across different cores, multi-level cache, CPU/memory affinity, etc.
In this tutorial, I describe how to identify CPU processor architecture from the command line on Linux. A CPU processor architecture is characterized by the number of physical sockets/processors, the number of cores per processor, multi-level (L1/L2/L3) cache, NUMA (Non-uniform memory access) configuration, etc.
likwid (Like I Knew What I’m Doing) is a suite of command line tools that are designed to support application designers for multi-threaded application development. likwid works with Linux kernel 2.6 and higher, and is regularly updated to support the latest generations of Intel/AMD processors. Currently it supports Intel Core2, Nehalem, Westmere and Sandy Bridge, as well as AMD K8, K10, and Bulldozer (Interlagos).
To install likwid on Linux:
hwloc is a command-line suite that gathers various attributes of the underlying processor architecture, such as NUMA memory nodes, multi-level caches, processor sockets, processor cores, PCI devices/bridges, etc.
To install hwloc on Debian, Ubuntu or Linux Mint:
Once hwloc package is installed, you can use lstopo to show processor architecture as follows.
$ lstopo --no-io
If you are running lstopo in Linux desktop environment, it will pop up a window which visualizes the underlying processor architecture and cache hierarchy nicely as follows.

If lstopo is called in a desktop-less server environment, it will show the output in text format as follows.
numactl is a command line tool for tuning NUMA hardware (such as pinning processes or threads to specific physical cores or ccNUMA nodes).
To install numactl on Debian, Ubuntu or Linux Mint:
In this tutorial, I describe how to identify CPU processor architecture from the command line on Linux. A CPU processor architecture is characterized by the number of physical sockets/processors, the number of cores per processor, multi-level (L1/L2/L3) cache, NUMA (Non-uniform memory access) configuration, etc.
Method One
likwid (Like I Knew What I’m Doing) is a suite of command line tools that are designed to support application designers for multi-threaded application development. likwid works with Linux kernel 2.6 and higher, and is regularly updated to support the latest generations of Intel/AMD processors. Currently it supports Intel Core2, Nehalem, Westmere and Sandy Bridge, as well as AMD K8, K10, and Bulldozer (Interlagos).
To install likwid on Linux:
$ tar xvfvz likwid-3.0.0.tar.gz
$ cd likwid-3.0.0
$ sudo make install
likwid comes with several command-line tools:$ cd likwid-3.0.0
$ sudo make install
- likwid-topology: Display the NUMA and cache topology.
- likwid-perfctr: Display the hardware performance counters of processors.
- likwid-features: Display and change hardware prefetch control bits on Intel Core 2 processors.
- likwid-pin: Pin a multi-threaded application to a specific CPU.
- likwid-bench: Benchmarking tool for rapid prototyping of threaded assembly kernels.
- likwid-mpirun: Script enabling CPU pinning of MPI and MPI/threaded hybrid applications.
- likwid-perfscope: Frontend for likwid-perfctr which allows real-time plotting of performance metrics.
- likwid-powermeter: Tool for accessing RAPL counters and query Turbo mode steps on Intel processor.
- likwid-memsweeper: Tool to clean up ccNUMA (cache-coherent NUMA) memory domains.
$ likwid-topology -g
-------------------------------------------------------------
CPU type: Intel Core Westmere processor
*************************************************************
Hardware Thread Topology
*************************************************************
Sockets: 2
Cores per socket: 4
Threads per core: 2
-------------------------------------------------------------
HWThread Thread Core Socket
0 0 0 0
1 0 0 1
2 0 10 0
3 0 10 1
4 0 1 0
5 0 1 1
6 0 9 0
7 0 9 1
8 1 0 0
9 1 0 1
10 1 10 0
11 1 10 1
12 1 1 0
13 1 1 1
14 1 9 0
15 1 9 1
-------------------------------------------------------------
Socket 0: ( 0 8 4 12 6 14 2 10 )
Socket 1: ( 1 9 5 13 7 15 3 11 )
-------------------------------------------------------------
*************************************************************
Cache Topology
*************************************************************
Level: 1
Size: 32 kB
Cache groups: ( 0 8 ) ( 4 12 ) ( 6 14 ) ( 2 10 ) ( 1 9 ) ( 5 13 ) (
7 15 ) ( 3 11 )
-------------------------------------------------------------
Level: 2
Size: 256 kB
Cache groups: ( 0 8 ) ( 4 12 ) ( 6 14 ) ( 2 10 ) ( 1 9 ) ( 5 13 ) (
7 15 ) ( 3 11 )
-------------------------------------------------------------
Level: 3
Size: 12 MB
Cache groups: ( 0 8 4 12 6 14 2 10 ) ( 1 9 5 13 7 15 3 11 )
-------------------------------------------------------------
*************************************************************
NUMA Topology
*************************************************************
NUMA domains: 2
-------------------------------------------------------------
Domain 0:
Processors: 0 2 4 6 8 10 12 14
Relative distance to nodes: 10 20
Memory: 4207.48 MB free of total 8181.75 MB
-------------------------------------------------------------
Domain 1:
Processors: 1 3 5 7 9 11 13 15
Relative distance to nodes: 20 10
Memory: 4020.77 MB free of total 8192 MB
-------------------------------------------------------------
*************************************************************
Graphical:
*************************************************************
Socket 0:
+-----------------------------------------+
| +-------+ +-------+ +-------+ +-------+ |
| | 0 8 | | 4 12 | | 6 14 | | 2 10 | |
| +-------+ +-------+ +-------+ +-------+ |
| +-------+ +-------+ +-------+ +-------+ |
| | 32kB | | 32kB | | 32kB | | 32kB | |
| +-------+ +-------+ +-------+ +-------+ |
| +-------+ +-------+ +-------+ +-------+ |
| | 256kB | | 256kB | | 256kB | | 256kB | |
| +-------+ +-------+ +-------+ +-------+ |
| +-------------------------------------+ |
| | 12MB | |
| +-------------------------------------+ |
+-----------------------------------------+
Socket 1:
+-----------------------------------------+
| +-------+ +-------+ +-------+ +-------+ |
| | 1 9 | | 5 13 | | 7 15 | | 3 11 | |
| +-------+ +-------+ +-------+ +-------+ |
| +-------+ +-------+ +-------+ +-------+ |
| | 32kB | | 32kB | | 32kB | | 32kB | |
| +-------+ +-------+ +-------+ +-------+ |
| +-------+ +-------+ +-------+ +-------+ |
| | 256kB | | 256kB | | 256kB | | 256kB | |
| +-------+ +-------+ +-------+ +-------+ |
| +-------------------------------------+ |
| | 12MB | |
| +-------------------------------------+ |
+-----------------------------------------+
The above is an example output of HP ProLiant DL380 G7, where it shows two physical sockets, Hyper-Threading enabled quad-core CPU in each socket, 32kB L1 cache, 256kB L2 cache, and 12MB L3 cache.
Method Two
hwloc is a command-line suite that gathers various attributes of the underlying processor architecture, such as NUMA memory nodes, multi-level caches, processor sockets, processor cores, PCI devices/bridges, etc.
To install hwloc on Debian, Ubuntu or Linux Mint:
$ sudo apt-get install hwloc
To install hwloc on Fedora, CentOS or RHEL:
$ sudo yum install hwloc
Once hwloc package is installed, you can use lstopo to show processor architecture as follows.
$ lstopo --no-io
If you are running lstopo in Linux desktop environment, it will pop up a window which visualizes the underlying processor architecture and cache hierarchy nicely as follows.

If lstopo is called in a desktop-less server environment, it will show the output in text format as follows.
Machine (16GB)
NUMANode L#0 (P#0 8182MB) + Socket L#0 + L3 L#0 (12MB)
L2 L#0 (256KB) + L1 L#0 (32KB) + Core L#0
PU L#0 (P#0)
PU L#1 (P#8)
L2 L#1 (256KB) + L1 L#1 (32KB) + Core L#1
PU L#2 (P#2)
PU L#3 (P#10)
L2 L#2 (256KB) + L1 L#2 (32KB) + Core L#2
PU L#4 (P#4)
PU L#5 (P#12)
L2 L#3 (256KB) + L1 L#3 (32KB) + Core L#3
PU L#6 (P#6)
PU L#7 (P#14)
NUMANode L#1 (P#1 8192MB) + Socket L#1 + L3 L#1 (12MB)
L2 L#4 (256KB) + L1 L#4 (32KB) + Core L#4
PU L#8 (P#1)
PU L#9 (P#9)
L2 L#5 (256KB) + L1 L#5 (32KB) + Core L#5
PU L#10 (P#3)
PU L#11 (P#11)
L2 L#6 (256KB) + L1 L#6 (32KB) + Core L#6
PU L#12 (P#5)
PU L#13 (P#13)
L2 L#7 (256KB) + L1 L#7 (32KB) + Core L#7
PU L#14 (P#7)
PU L#15 (P#15)
You can let lstopo export processor architecture visualization to a separate image file by specifying an output file as follows.
$ lstopo --no-io topo.png
Method Three
numactl is a command line tool for tuning NUMA hardware (such as pinning processes or threads to specific physical cores or ccNUMA nodes).
To install numactl on Debian, Ubuntu or Linux Mint:
$ sudo apt-get install numactl
To install numactl on Fedora, CentOS or RHEL:
$ sudo yum install numactl
If you want to check available NUMA nodes with numactl, do the following:
$ numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6 8 10 12 14
node 0 size: 8181 MB
node 0 free: 4235 MB
node 1 cpus: 1 3 5 7 9 11 13 15
node 1 size: 8191 MB
node 1 free: 4048 MB
node distances:
node 0 1
0: 10 20
1: 20 10
Article from @xmodulo.com
No comments:
Post a Comment