A drop-in replacement for some features of Go's standard unicode
package.
This package works by computing an array jump table for fast Unicode property lookup.
Drawbacks include:
- The startup cost of computing the array of properties
- Memory consumption of ~25Mb on x86-64
This package isn't a replacement for the standard unicode
package; it's recommended this package is trialled when performance is a priority; e.g. mass-text normalisation and segmentation. For input containing highly randomised data across the Unicode spectrum it will perform worse than the standard unicode
package due to cache locality. As always: measure.
Use make bench
to run the benchmarks.
Dell XPS 13 9370 (Intel i7-8550U) / Ubuntu 16.04, CPU in performance mode:
benchmark old ns/op new ns/op delta
BenchmarkIsDigit-8 34.1 1.10 -96.77%
BenchmarkIsDigitUnsafe-8 33.8 0.55 -98.37%
BenchmarkIsGraphic-8 127 1.11 -99.13%
BenchmarkIsGraphicUnsafe-8 128 0.83 -99.35%
BenchmarkIsLetter-8 40.9 1.10 -97.31%
BenchmarkIsLetterUnsafe-8 40.7 0.55 -98.65%
BenchmarkIsLower-8 30.2 1.12 -96.29%
BenchmarkIsLowerUnsafe-8 30.2 0.83 -97.25%
BenchmarkIsMark-8 35.0 1.10 -96.86%
BenchmarkIsMarkUnsafe-8 35.1 0.55 -98.43%
BenchmarkIsNumber-8 30.1 1.10 -96.35%
BenchmarkIsNumberUnsafe-8 30.3 0.83 -97.26%
BenchmarkIsPrint-8 125 1.10 -99.12%
BenchmarkIsPrintUnsafe-8 125 0.54 -99.57%
BenchmarkIsPunct-8 30.1 1.11 -96.31%
BenchmarkIsPunctUnsafe-8 29.9 0.82 -97.26%
BenchmarkIsSpace-8 14.2 1.10 -92.25%
BenchmarkIsSpaceUnsafe-8 14.3 0.55 -96.15%
BenchmarkIsSymbol-8 30.5 1.11 -96.36%
BenchmarkIsSymbolUnsafe-8 30.2 0.83 -97.25%
BenchmarkIsTitle-8 14.8 1.10 -92.57%
BenchmarkIsTitleUnsafe-8 14.7 0.55 -96.26%
BenchmarkIsUpper-8 30.0 1.11 -96.30%
BenchmarkIsUpperUnsafe-8 30.2 0.82 -97.28%
BenchmarkSimpleFold-8 15.3 0.28 -98.17%
BenchmarkSimpleFoldUnsafe-8 15.1 0.55 -96.36%
BenchmarkToLower-8 209746 90991 -56.62%
BenchmarkToLowerUnsafe-8 208996 91548 -56.20%
BenchmarkToTitle-8 222480 101441 -54.40%
BenchmarkToTitleUnsafe-8 219774 102589 -53.32%
BenchmarkToUpper-8 223247 103880 -53.47%
BenchmarkToUpperUnsafe-8 221264 104025 -52.99%
BenchmarkStringToLowerCase-8 308085 108448 -64.80%
BenchmarkStringToLowerCaseUnsafe-8 308406 91254 -70.41%
BenchmarkStringToTitleCase-8 318338 120817 -62.05%
BenchmarkStringToTitleCaseUnsafe-8 321452 103651 -67.76%
BenchmarkStringToUpperCase-8 325173 120402 -62.97%
BenchmarkStringToUpperCaseUnsafe-8 326598 104489 -68.01%
benchmark old MB/s new MB/s speedup
BenchmarkIsDigit-8 234.40 7267.85 31.01x
BenchmarkIsDigitUnsafe-8 236.47 14623.67 61.84x
BenchmarkIsGraphic-8 62.55 7229.24 115.58x
BenchmarkIsGraphicUnsafe-8 62.04 9668.54 155.84x
BenchmarkIsLetter-8 195.50 7264.06 37.16x
BenchmarkIsLetterUnsafe-8 196.44 14480.12 73.71x
BenchmarkIsLower-8 264.99 7117.67 26.86x
BenchmarkIsLowerUnsafe-8 264.74 9621.64 36.34x
BenchmarkIsMark-8 228.38 7278.42 31.87x
BenchmarkIsMarkUnsafe-8 227.88 14459.79 63.45x
BenchmarkIsNumber-8 266.12 7285.09 27.38x
BenchmarkIsNumberUnsafe-8 264.07 9664.48 36.60x
BenchmarkIsPrint-8 63.93 7242.14 113.28x
BenchmarkIsPrintUnsafe-8 63.95 14881.46 232.70x
BenchmarkIsPunct-8 266.03 7217.83 27.13x
BenchmarkIsPunctUnsafe-8 267.55 9734.01 36.38x
BenchmarkIsSpace-8 493.22 6351.52 12.88x
BenchmarkIsSpaceUnsafe-8 490.55 12647.13 25.78x
BenchmarkIsSymbol-8 262.54 7225.87 27.52x
BenchmarkIsSymbolUnsafe-8 265.11 9598.59 36.21x
BenchmarkIsTitle-8 471.93 6381.03 13.52x
BenchmarkIsTitleUnsafe-8 475.34 12706.11 26.73x
BenchmarkIsUpper-8 266.55 7236.34 27.15x
BenchmarkIsUpperUnsafe-8 264.68 9711.01 36.69x
BenchmarkSimpleFold-8 261.92 14454.61 55.19x
BenchmarkSimpleFoldUnsafe-8 264.39 7266.35 27.48x
BenchmarkToLower-8 76.34 175.98 2.31x
BenchmarkToLowerUnsafe-8 76.62 174.91 2.28x
BenchmarkToTitle-8 71.97 157.85 2.19x
BenchmarkToTitleUnsafe-8 72.86 156.09 2.14x
BenchmarkToUpper-8 71.73 154.15 2.15x
BenchmarkToUpperUnsafe-8 72.37 153.93 2.13x
BenchmarkStringToLowerCase-8 51.98 147.66 2.84x
BenchmarkStringToLowerCaseUnsafe-8 51.92 175.48 3.38x
BenchmarkStringToTitleCase-8 50.30 132.54 2.63x
BenchmarkStringToTitleCaseUnsafe-8 49.81 154.49 3.10x
BenchmarkStringToUpperCase-8 49.24 133.00 2.70x
BenchmarkStringToUpperCaseUnsafe-8 49.03 153.25 3.13x
Raspberry Pi 2 Model B / Ubuntu 16.04:
benchmark old ns/op new ns/op delta
BenchmarkIsDigit-4 712 14.7 -97.94%
BenchmarkIsDigitUnsafe-4 716 6.69 -99.07%
BenchmarkIsGraphic-4 2051 13.4 -99.35%
BenchmarkIsGraphicUnsafe-4 2046 6.72 -99.67%
BenchmarkIsLetter-4 601 13.4 -97.77%
BenchmarkIsLetterUnsafe-4 600 6.73 -98.88%
BenchmarkIsLower-4 494 13.4 -97.29%
BenchmarkIsLowerUnsafe-4 495 6.70 -98.65%
BenchmarkIsMark-4 542 13.4 -97.53%
BenchmarkIsMarkUnsafe-4 542 6.73 -98.76%
BenchmarkIsNumber-4 492 14.5 -97.05%
BenchmarkIsNumberUnsafe-4 490 6.69 -98.63%
BenchmarkIsPrint-4 1985 13.4 -99.32%
BenchmarkIsPrintUnsafe-4 1980 6.72 -99.66%
BenchmarkIsPunct-4 499 13.4 -97.31%
BenchmarkIsPunctUnsafe-4 501 6.72 -98.66%
BenchmarkIsSpace-4 315 14.4 -95.43%
BenchmarkIsSpaceUnsafe-4 315 6.69 -97.88%
BenchmarkIsSymbol-4 492 13.4 -97.28%
BenchmarkIsSymbolUnsafe-4 493 6.71 -98.64%
BenchmarkIsTitle-4 334 13.5 -95.96%
BenchmarkIsTitleUnsafe-4 333 6.72 -97.98%
BenchmarkIsUpper-4 495 14.7 -97.03%
BenchmarkIsUpperUnsafe-4 495 6.69 -98.65%
BenchmarkSimpleFold-4 213 4.48 -97.90%
BenchmarkSimpleFoldUnsafe-4 213 4.46 -97.91%
BenchmarkToLower-4 159897 58494 -63.42%
BenchmarkToLowerUnsafe-4 159290 62179 -60.96%
BenchmarkToTitle-4 158902 61126 -61.53%
BenchmarkToTitleUnsafe-4 159651 65347 -59.07%
BenchmarkToUpper-4 159147 61988 -61.05%
BenchmarkToUpperUnsafe-4 159190 55153 -65.35%
BenchmarkStringToLowerCase-4 245836 67834 -72.41%
BenchmarkStringToLowerCaseUnsafe-4 245460 66175 -73.04%
BenchmarkStringToTitleCase-4 245313 81020 -66.97%
BenchmarkStringToTitleCaseUnsafe-4 246914 61785 -74.98%
BenchmarkStringToUpperCase-4 246976 69070 -72.03%
BenchmarkStringToUpperCaseUnsafe-4 246685 55126 -77.65%
benchmark old MB/s new MB/s speedup
BenchmarkIsDigit-4 11.24 545.27 48.51x
BenchmarkIsDigitUnsafe-4 11.17 1195.03 106.99x
BenchmarkIsGraphic-4 3.90 597.51 153.21x
BenchmarkIsGraphicUnsafe-4 3.91 1190.33 304.43x
BenchmarkIsLetter-4 13.31 595.63 44.75x
BenchmarkIsLetterUnsafe-4 13.33 1188.44 89.16x
BenchmarkIsLower-4 16.18 595.76 36.82x
BenchmarkIsLowerUnsafe-4 16.16 1194.75 73.93x
BenchmarkIsMark-4 14.74 595.65 40.41x
BenchmarkIsMarkUnsafe-4 14.74 1188.45 80.63x
BenchmarkIsNumber-4 16.26 550.34 33.85x
BenchmarkIsNumberUnsafe-4 16.29 1195.30 73.38x
BenchmarkIsPrint-4 4.03 595.63 147.80x
BenchmarkIsPrintUnsafe-4 4.04 1191.12 294.83x
BenchmarkIsPunct-4 16.00 595.16 37.20x
BenchmarkIsPunctUnsafe-4 15.96 1191.26 74.64x
BenchmarkIsSpace-4 22.21 485.16 21.84x
BenchmarkIsSpaceUnsafe-4 22.20 1045.68 47.10x
BenchmarkIsSymbol-4 16.24 595.77 36.69x
BenchmarkIsSymbolUnsafe-4 16.22 1191.50 73.46x
BenchmarkIsTitle-4 20.94 519.86 24.83x
BenchmarkIsTitleUnsafe-4 20.97 1042.31 49.70x
BenchmarkIsUpper-4 16.16 545.92 33.78x
BenchmarkIsUpperUnsafe-4 16.16 1195.17 73.96x
BenchmarkSimpleFold-4 18.73 893.80 47.72x
BenchmarkSimpleFoldUnsafe-4 18.73 896.38 47.86x
BenchmarkToLower-4 4.43 12.10 2.73x
BenchmarkToLowerUnsafe-4 4.44 11.39 2.57x
BenchmarkToTitle-4 4.46 11.58 2.60x
BenchmarkToTitleUnsafe-4 4.43 10.83 2.44x
BenchmarkToUpper-4 4.45 11.42 2.57x
BenchmarkToUpperUnsafe-4 4.45 12.84 2.89x
BenchmarkStringToLowerCase-4 2.88 10.44 3.62x
BenchmarkStringToLowerCaseUnsafe-4 2.88 10.70 3.72x
BenchmarkStringToTitleCase-4 2.89 8.74 3.02x
BenchmarkStringToTitleCaseUnsafe-4 2.87 11.46 3.99x
BenchmarkStringToUpperCase-4 2.87 10.25 3.57x
BenchmarkStringToUpperCaseUnsafe-4 2.87 12.84 4.47x