========= INFO ======== memcpy test by Siarhei Siamashka http://sourceware.org/ml/libc-ports/2009-07/msg00000.html <- sources of the test are located there The neon parts were removed, since not all the hardware tested below have neon ========= Marvell Sheevaplug (Kirkwood DDR2-800(400MHz)) ========= --- Running benchmarks (average case/perfect alignment case) --- very small data test: memcpy_arm : (3 bytes copy) = 101.4 MB/s / 103.5 MB/s memcpy_arm : (4 bytes copy) = 90.1 MB/s / 71.6 MB/s memcpy_arm : (5 bytes copy) = 96.0 MB/s / 89.6 MB/s memcpy_arm : (7 bytes copy) = 106.8 MB/s / 121.7 MB/s memcpy_arm : (8 bytes copy) = 118.4 MB/s / 133.7 MB/s memcpy_arm : (11 bytes copy) = 146.0 MB/s / 183.2 MB/s memcpy_arm : (12 bytes copy) = 156.8 MB/s / 199.9 MB/s memcpy_arm : (15 bytes copy) = 186.6 MB/s / 243.1 MB/s memcpy_arm : (16 bytes copy) = 192.2 MB/s / 259.4 MB/s memcpy_arm : (24 bytes copy) = 258.9 MB/s / 368.7 MB/s memcpy_arm : (31 bytes copy) = 303.3 MB/s / 453.3 MB/s L1 cached data: memcpy_arm : (4096 bytes copy) = 1044.2 MB/s / 2283.8 MB/s memcpy_arm : (6144 bytes copy) = 1049.8 MB/s / 2289.4 MB/s L2 cached data: memcpy_arm : (65536 bytes copy) = 764.6 MB/s / 1063.0 MB/s memcpy_arm : (98304 bytes copy) = 668.2 MB/s / 879.8 MB/s SDRAM: memcpy_arm : (2097152 bytes copy) = 267.7 MB/s / 293.1 MB/s memcpy_arm : (3145728 bytes copy) = 252.8 MB/s / 275.6 MB/s ========= Genesi EfikaMX TO3 (i.mx515 DDR2-400(200MHz)) ========= --- Running benchmarks (average case/perfect alignment case) --- very small data test: memcpy_arm : (3 bytes copy) = 111.1 MB/s / 114.9 MB/s memcpy_arm : (4 bytes copy) = 85.9 MB/s / 90.5 MB/s memcpy_arm : (5 bytes copy) = 97.3 MB/s / 116.7 MB/s memcpy_arm : (7 bytes copy) = 118.9 MB/s / 162.6 MB/s memcpy_arm : (8 bytes copy) = 131.3 MB/s / 170.9 MB/s memcpy_arm : (11 bytes copy) = 164.3 MB/s / 233.8 MB/s memcpy_arm : (12 bytes copy) = 176.2 MB/s / 233.9 MB/s memcpy_arm : (15 bytes copy) = 202.4 MB/s / 292.9 MB/s memcpy_arm : (16 bytes copy) = 211.1 MB/s / 284.7 MB/s memcpy_arm : (24 bytes copy) = 272.6 MB/s / 361.9 MB/s memcpy_arm : (31 bytes copy) = 315.0 MB/s / 434.8 MB/s L1 cached data: memcpy_arm : (4096 bytes copy) = 1322.6 MB/s / 2224.1 MB/s memcpy_arm : (6144 bytes copy) = 1338.0 MB/s / 2250.1 MB/s L2 cached data: memcpy_arm : (65536 bytes copy) = 919.4 MB/s / 1184.5 MB/s memcpy_arm : (98304 bytes copy) = 900.8 MB/s / 1162.1 MB/s SDRAM: memcpy_arm : (2097152 bytes copy) = 212.6 MB/s / 193.4 MB/s memcpy_arm : (3145728 bytes copy) = 211.7 MB/s / 223.9 MB/s (*) 1 MB = 1000000 bytes (*) 'memcpy_arm' - an implementation for older ARM cores from glibc-ports ========= Nvidia Harmony devboard (Nvidia Tegra2 DDR2-667(333MHz)) ========== --- Running benchmarks (average case/perfect alignment case) --- very small data test: memcpy_arm : (3 bytes copy) = 200.8 MB/s / 197.3 MB/s memcpy_arm : (4 bytes copy) = 111.1 MB/s / 139.3 MB/s memcpy_arm : (5 bytes copy) = 123.9 MB/s / 181.5 MB/s memcpy_arm : (7 bytes copy) = 146.9 MB/s / 278.0 MB/s memcpy_arm : (8 bytes copy) = 162.9 MB/s / 240.5 MB/s memcpy_arm : (11 bytes copy) = 209.5 MB/s / 369.3 MB/s memcpy_arm : (12 bytes copy) = 223.7 MB/s / 342.2 MB/s memcpy_arm : (15 bytes copy) = 268.9 MB/s / 471.8 MB/s memcpy_arm : (16 bytes copy) = 280.6 MB/s / 442.2 MB/s memcpy_arm : (24 bytes copy) = 380.8 MB/s / 596.9 MB/s memcpy_arm : (31 bytes copy) = 440.6 MB/s / 764.2 MB/s L1 cached data: memcpy_arm : (4096 bytes copy) = 1723.1 MB/s / 3415.6 MB/s memcpy_arm : (6144 bytes copy) = 1746.8 MB/s / 3456.8 MB/s L2 cached data: memcpy_arm : (65536 bytes copy) = 1600.2 MB/s / 1863.3 MB/s memcpy_arm : (98304 bytes copy) = 1603.3 MB/s / 1846.3 MB/s SDRAM: memcpy_arm : (2097152 bytes copy) = 345.2 MB/s / 342.5 MB/s memcpy_arm : (3145728 bytes copy) = 347.6 MB/s / 344.0 MB/s (*) 1 MB = 1000000 bytes (*) 'memcpy_arm' - an implementation for older ARM cores from glibc-ports ========= TI OMAP4 Pandaboard (DDR2-400(200MHz, Rev EA1)) ========== (Note: The kernel used was a L24.9 kernel from ubuntu) --- Running benchmarks (average case/perfect alignment case) --- very small data test: memcpy_arm : (3 bytes copy) = 122.0 MB/s / 118.8 MB/s memcpy_arm : (4 bytes copy) = 69.7 MB/s / 86.3 MB/s memcpy_arm : (5 bytes copy) = 79.1 MB/s / 115.1 MB/s memcpy_arm : (7 bytes copy) = 91.9 MB/s / 166.7 MB/s memcpy_arm : (8 bytes copy) = 100.4 MB/s / 154.9 MB/s memcpy_arm : (11 bytes copy) = 131.6 MB/s / 238.2 MB/s memcpy_arm : (12 bytes copy) = 140.1 MB/s / 219.7 MB/s memcpy_arm : (15 bytes copy) = 167.9 MB/s / 305.5 MB/s memcpy_arm : (16 bytes copy) = 175.9 MB/s / 276.7 MB/s memcpy_arm : (24 bytes copy) = 234.2 MB/s / 373.5 MB/s memcpy_arm : (31 bytes copy) = 271.9 MB/s / 489.2 MB/s L1 cached data: memcpy_arm : (4096 bytes copy) = 1106.2 MB/s / 2054.1 MB/s memcpy_arm : (6144 bytes copy) = 1118.0 MB/s / 2078.2 MB/s L2 cached data: memcpy_arm : (65536 bytes copy) = 1019.2 MB/s / 1184.3 MB/s memcpy_arm : (98304 bytes copy) = 1019.0 MB/s / 1176.9 MB/s SDRAM: memcpy_arm : (2097152 bytes copy) = 185.1 MB/s / 203.1 MB/s memcpy_arm : (3145728 bytes copy) = 184.6 MB/s / 215.5 MB/s (*) 1 MB = 1000000 bytes (*) 'memcpy_arm' - an implementation for older ARM cores from glibc-ports ========= TI OMAP4 Pandaboard (DDR2-800(400MHz, Rev A1, provided by Gustavo Sverzut)) ========== (Note: The kernel used was a L24.9 kernel from ubuntu) --- Running benchmarks (average case/perfect alignment case) --- very small data test: memcpy_arm : (3 bytes copy) = 121.9 MB/s / 118.8 MB/s memcpy_arm : (4 bytes copy) = 71.0 MB/s / 86.2 MB/s memcpy_arm : (5 bytes copy) = 78.8 MB/s / 114.8 MB/s memcpy_arm : (7 bytes copy) = 90.7 MB/s / 166.1 MB/s memcpy_arm : (8 bytes copy) = 101.0 MB/s / 154.3 MB/s memcpy_arm : (11 bytes copy) = 131.3 MB/s / 235.3 MB/s memcpy_arm : (12 bytes copy) = 139.4 MB/s / 219.6 MB/s memcpy_arm : (15 bytes copy) = 166.8 MB/s / 302.1 MB/s memcpy_arm : (16 bytes copy) = 173.3 MB/s / 275.1 MB/s memcpy_arm : (24 bytes copy) = 232.4 MB/s / 373.4 MB/s memcpy_arm : (31 bytes copy) = 272.9 MB/s / 488.3 MB/s L1 cached data: memcpy_arm : (4096 bytes copy) = 1101.2 MB/s / 2048.0 MB/s memcpy_arm : (6144 bytes copy) = 1112.7 MB/s / 2071.8 MB/s L2 cached data: memcpy_arm : (65536 bytes copy) = 1017.4 MB/s / 1185.7 MB/s memcpy_arm : (98304 bytes copy) = 1016.1 MB/s / 1174.1 MB/s SDRAM: memcpy_arm : (2097152 bytes copy) = 255.3 MB/s / 282.5 MB/s memcpy_arm : (3145728 bytes copy) = 255.1 MB/s / 284.7 MB/s (*) 1 MB = 1000000 bytes (*) 'memcpy_arm' - an implementation for older ARM cores from glibc-ports ========= Trimslice Dev-Kit (Nvidia Tegra2 DDR2-667(333MHz)) ========= --- Running benchmarks (average case/perfect alignment case) --- very small data test: memcpy_arm : (3 bytes copy) = 186.7 MB/s / 182.2 MB/s memcpy_arm : (4 bytes copy) = 107.4 MB/s / 131.8 MB/s memcpy_arm : (5 bytes copy) = 123.5 MB/s / 176.3 MB/s memcpy_arm : (7 bytes copy) = 141.3 MB/s / 255.3 MB/s memcpy_arm : (8 bytes copy) = 157.0 MB/s / 237.9 MB/s memcpy_arm : (11 bytes copy) = 199.6 MB/s / 361.9 MB/s memcpy_arm : (12 bytes copy) = 213.7 MB/s / 336.6 MB/s memcpy_arm : (15 bytes copy) = 254.8 MB/s / 464.2 MB/s memcpy_arm : (16 bytes copy) = 267.8 MB/s / 424.9 MB/s memcpy_arm : (24 bytes copy) = 354.8 MB/s / 573.7 MB/s memcpy_arm : (31 bytes copy) = 415.0 MB/s / 748.2 MB/s L1 cached data: memcpy_arm : (4096 bytes copy) = 1684.2 MB/s / 3152.9 MB/s memcpy_arm : (6144 bytes copy) = 1705.6 MB/s / 3183.2 MB/s L2 cached data: memcpy_arm : (65536 bytes copy) = 1510.3 MB/s / 1719.2 MB/s memcpy_arm : (98304 bytes copy) = 1511.7 MB/s / 1695.4 MB/s SDRAM: memcpy_arm : (2097152 bytes copy) = 364.6 MB/s / 376.6 MB/s memcpy_arm : (3145728 bytes copy) = 382.5 MB/s / 397.5 MB/s (*) 1 MB = 1000000 bytes (*) 'memcpy_arm' - an implementation for older ARM cores from glibc-ports