Discussion:
Fast lookup of 234 possible 1/2/3 character codes
(too old to reply)
Robert Prins
2023-02-13 18:00:11 UTC
Permalink
How likely is it that that there's a faster way than doing just a binary lookup,
i.e. something like the current

mov edx, in_plate
bswap edx

mov esi, 1
mov edi, type plates / type plate

@01:
lea eax, [esi + edi]
and eax, -2
shl eax, 5

mov ebx, dword ptr plates[eax - type plate]
bswap ebx

shr eax, 6

cmp ebx, edx
je @04

jl @02

lea edi, [eax - 1]
jmp @03

@02:
lea esi, [eax + 1]

@03:
cmp esi, edi
jle @01

dw op_ud2

@04:
ret

where plates is an array of plate:

//------------------------------------------------
// List of all countries / nationalities
//
// - sign : license plate sign
//
// - stat : status
// ' ': official
// '*': unofficial
// - visit:
// '!': visited (non-hitching, comment only)
// 'V': visited
// ' ': unvisited
// - hitch:
// 'H': hitched-in
// '*': hitched through
// - local:
// '+': in-country ride with local
// '-': out-of-country ride with nationality
//------------------------------------------------
type
plate = record
sign : tycona; // array [1..4] of char
stat : char;
visit: char;
hitch: char;
local: char;
cntyl: byte;
cntys: array [1..55] of char;
end;

Thanks,

Robert
--
Robert AH Prins
robert(a)prino(d)org
The hitchhiking grandfather - https://prino.neocities.org/
Some REXX code for use on z/OS - https://prino.neocities.org/zOS/zOS-Tools.html
Terje Mathisen
2023-02-15 08:14:32 UTC
Permalink
This one cries out for a hash table!

Since you have a fixed set of codes, you can spend a little time finding
a fast hash that converts this into a single byte value with zero
collisions, at which point the hash entry is your country code?

Try something like c[0]+c[1]*2+c[2]*4 as a first attempt, then mix in a
few XORs or rotations instead of those SHL only ops.

It should be doable in well under 10 cycles and maybe under 5...

If a near-perfect hash is too slow or hard to find, then you could allow
a fixed maximal number of collisions (like max two entries) and scan
those, but that needs room for the original values in the hash table.

Terje
Post by Robert Prins
How likely is it that that there's a faster way than doing just a binary
lookup, i.e. something like the current
  mov     edx, in_plate
  bswap   edx
  mov     esi, 1
  mov     edi, type plates / type plate
  lea     eax, [esi + edi]
  and     eax, -2
  shl     eax, 5
  mov     ebx, dword ptr plates[eax - type plate]
  bswap   ebx
  shr     eax, 6
  cmp     ebx, edx
  lea     edi, [eax - 1]
  lea     esi, [eax + 1]
  cmp     esi, edi
  dw      op_ud2
  ret
//------------------------------------------------
// List of all countries / nationalities
//
// - sign      : license plate sign
//
// - stat : status
//    ' ': official
//    '*': unofficial
//    '!': visited (non-hitching, comment only)
//    'V': visited
//    ' ': unvisited
//    'H': hitched-in
//    '*': hitched through
//    '+': in-country ride with local
//    '-': out-of-country ride with nationality
//------------------------------------------------
type
  plate = record
            sign : tycona; // array [1..4] of char
            stat : char;
            visit: char;
            hitch: char;
            local: char;
            cntyl: byte;
            cntys: array [1..55] of char;
          end;
Thanks,
Robert
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
Robert Prins
2023-02-15 20:02:24 UTC
Permalink
Post by Terje Mathisen
This one cries out for a hash table!
Yes, but with the 1/2/3 letter codes being the (un)official car licence plate
codes of all 195 independent countries, with some extra odd-man-out ones like
GBJ/GBG, and the oddest, SMA, I've given this a try, but GNU's GPERF doesn't
come up with anything. And yes, I've also read
<http://courses.cs.vt.edu/~cs3114/Fall10/Notes/Supplemental/p17-cichelli.pdf> ;)
Post by Terje Mathisen
Since you have a fixed set of codes, you can spend a little time finding a fast
hash that converts this into a single byte value with zero collisions, at which
point the hash entry is your country code?
I've tried I don't know how many variations, but nothing has come close, and
although it will be slower than hashing, it might well be that a simple lookup
of the first three of four most common countries/nationalities (D/B/LT/GB) might
be a decent compromise, I did use this technique at a Belgian employer where the
three most common nationalities were B, NL, and F, accounting for more than 90%
of the insured. The best I got out of GPERF was a table of 425 entries, but the
email from 2016 didn't include the hash function. :( But being someone who
rarely, if ever, deletes something, it might still be around.
Post by Terje Mathisen
Try something like c[0]+c[1]*2+c[2]*4 as a first attempt, then mix in a few XORs
or rotations instead of those SHL only ops.
It should be doable in well under 10 cycles and maybe under 5...
Which would handsomely beat a binary lookup...
Post by Terje Mathisen
If a near-perfect hash is too slow or hard to find, then you could allow a fixed
maximal number of collisions (like max two entries) and scan those, but that
needs room for the original values in the hash table.
I've got the code that Borland used in TP (4?/5?6/7) to do hashing, and that
should be (relatively) easy to adapt.

Robert
--
Robert AH Prins
robert(a)prino(d)org
The hitchhiking grandfather - https://prino.neocities.org/
Some REXX code for use on z/OS - https://prino.neocities.org/zOS/zOS-Tools.html
Post by Terje Mathisen
Post by Robert Prins
How likely is it that that there's a faster way than doing just a binary
lookup, i.e. something like the current
   mov     edx, in_plate
   bswap   edx
   mov     esi, 1
   mov     edi, type plates / type plate
   lea     eax, [esi + edi]
   and     eax, -2
   shl     eax, 5
   mov     ebx, dword ptr plates[eax - type plate]
   bswap   ebx
   shr     eax, 6
   cmp     ebx, edx
   lea     edi, [eax - 1]
   lea     esi, [eax + 1]
   cmp     esi, edi
   dw      op_ud2
   ret
//------------------------------------------------
// List of all countries / nationalities
//
// - sign      : license plate sign
//
// - stat : status
//    ' ': official
//    '*': unofficial
//    '!': visited (non-hitching, comment only)
//    'V': visited
//    ' ': unvisited
//    'H': hitched-in
//    '*': hitched through
//    '+': in-country ride with local
//    '-': out-of-country ride with nationality
//------------------------------------------------
type
   plate = record
             sign : tycona; // array [1..4] of char
             stat : char;
             visit: char;
             hitch: char;
             local: char;
             cntyl: byte;
             cntys: array [1..55] of char;
           end;
Terje Mathisen
2023-02-15 21:15:17 UTC
Permalink
Can you please post the full list? comma, space or newline separated?

Terje
Post by Robert Prins
Post by Terje Mathisen
This one cries out for a hash table!
Yes, but with the 1/2/3 letter codes being the (un)official car licence
plate codes of all 195 independent countries, with some extra
odd-man-out ones like GBJ/GBG, and the oddest, SMA, I've given this a
try, but GNU's GPERF doesn't come up with anything. And yes, I've also
read
<http://courses.cs.vt.edu/~cs3114/Fall10/Notes/Supplemental/p17-cichelli.pdf> ;)
Post by Terje Mathisen
Since you have a fixed set of codes, you can spend a little time
finding a fast hash that converts this into a single byte value with
zero collisions, at which point the hash entry is your country code?
I've tried I don't know how many variations, but nothing has come close,
and although it will be slower than hashing, it might well be that a
simple lookup of the first three of four most common
countries/nationalities (D/B/LT/GB) might be a decent compromise, I did
use this technique at a Belgian employer where the three most common
nationalities were B, NL, and F, accounting for more than 90% of the
insured. The best I got out of GPERF was a table of 425 entries, but the
email from 2016 didn't include the hash function. :( But being someone
who rarely, if ever, deletes something, it might still be around.
Post by Terje Mathisen
Try something like c[0]+c[1]*2+c[2]*4 as a first attempt, then mix in
a few XORs or rotations instead of those SHL only ops.
It should be doable in well under 10 cycles and maybe under 5...
Which would handsomely beat a binary lookup...
Post by Terje Mathisen
If a near-perfect hash is too slow or hard to find, then you could
allow a fixed maximal number of collisions (like max two entries) and
scan those, but that needs room for the original values in the hash
table.
I've got the code that Borland used in TP (4?/5?6/7) to do hashing, and
that should be (relatively) easy to adapt.
Robert
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
Robert Prins
2023-02-16 00:21:17 UTC
Permalink
Post by Terje Mathisen
Can you please post the full list? comma, space or newline separated?
(sign: 'A '; cnty: 'Austria ' ),
(sign: 'AFG '; cnty: 'Afghanistan ' ),
(sign: 'AG '; cnty: 'Antigua and Barbuda ' ),
(sign: 'AL '; cnty: 'Albania ' ),
(sign: 'AM '; cnty: 'Armenia ' ),
(sign: 'AND '; cnty: 'Andorra ' ),
(sign: 'ANG '; cnty: 'Angola ' ),
(sign: 'ARK '; cnty: 'Antarctica ' ),
(sign: 'ARU '; cnty: 'Aruba ' ),
(sign: 'AS '; cnty: 'Asturias ' ),
(sign: 'AUA '; cnty: 'Aruba ' ),
(sign: 'AUS '; cnty: 'Australia ' ),
(sign: 'AX '; cnty: 'Aland Islands ' ),
(sign: 'AXA '; cnty: 'Anguilla ' ),
(sign: 'AZ '; cnty: 'Azerbaijan ' ),
(sign: 'B '; cnty: 'Belgium ' ),
(sign: 'BD '; cnty: 'Bangladesh ' ),
(sign: 'BDS '; cnty: 'Barbados ' ),
(sign: 'BF '; cnty: 'Burkina Faso ' ),
(sign: 'BG '; cnty: 'Bulgaria ' ),
(sign: 'BH '; cnty: 'Belize ' ),
(sign: 'BHT '; cnty: 'Bhutan ' ),
(sign: 'BIH '; cnty: 'Bosnia and Herzegovina ' ),
(sign: 'BOL '; cnty: 'Bolivia ' ),
(sign: 'BR '; cnty: 'Brazil ' ),
(sign: 'BRN '; cnty: 'Bahrain ' ),
(sign: 'BRU '; cnty: 'Brunei ' ),
(sign: 'BS '; cnty: 'Bahamas ' ),
(sign: 'BUR '; cnty: 'Myanmar ' ),
(sign: 'BVI '; cnty: 'British Virgin Islands ' ),
(sign: 'BW '; cnty: 'Botswana ' ),
(sign: 'BY '; cnty: 'Belarus ' ),
(sign: 'BZ '; cnty: 'Belize ' ),
(sign: 'C '; cnty: 'Cuba ' ),
(sign: 'CAM '; cnty: 'Cameroon ' ),
(sign: 'CDN '; cnty: 'Canada ' ),
(sign: 'CGO '; cnty: 'Democratic Republic of the Congo ' ),
(sign: 'CH '; cnty: 'Switzerland ' ),
(sign: 'CHN '; cnty: 'People''s Republic of China ' ),
(sign: 'CI '; cnty: 'Ivory Coast (Cote d''Ivoire) ' ),
(sign: 'CL '; cnty: 'Sri Lanka ' ),
(sign: 'CO '; cnty: 'Colombia ' ),
(sign: 'COM '; cnty: 'Comoros ' ),
(sign: 'CR '; cnty: 'Costa Rica ' ),
(sign: 'CV '; cnty: 'Cape Verde ' ),
(sign: 'CY '; cnty: 'Cyprus ' ),
(sign: 'CYM '; cnty: 'Wales ' ),
(sign: 'CZ '; cnty: 'Czech Republic ' ),
(sign: 'D '; cnty: 'Germany ' ),
(sign: 'DJI '; cnty: 'Djibouti ' ),
(sign: 'DK '; cnty: 'Denmark ' ),
(sign: 'DOM '; cnty: 'Dominican Republic ' ),
(sign: 'DY '; cnty: 'Benin ' ),
(sign: 'DZ '; cnty: 'Algeria ' ),
(sign: 'E '; cnty: 'Spain ' ),
(sign: 'EAK '; cnty: 'Kenya ' ),
(sign: 'EAT '; cnty: 'Tanzania ' ),
(sign: 'EAU '; cnty: 'Uganda ' ),
(sign: 'EAZ '; cnty: 'Zanzibar ' ),
(sign: 'EC '; cnty: 'Ecuador ' ),
(sign: 'ENG '; cnty: 'England ' ),
(sign: 'ER '; cnty: 'Eritrea ' ),
(sign: 'ES '; cnty: 'El Salvador ' ),
(sign: 'EST '; cnty: 'Estonia ' ),
(sign: 'ET '; cnty: 'Egypt ' ),
(sign: 'ETH '; cnty: 'Ethiopia ' ),
(sign: 'F '; cnty: 'France ' ),
(sign: 'FIN '; cnty: 'Finland ' ),
(sign: 'FJI '; cnty: 'Fiji ' ),
(sign: 'FL '; cnty: 'Liechtenstein ' ),
(sign: 'FO '; cnty: 'Faroe Islands ' ),
(sign: 'FSM '; cnty: 'Federated States of Micronesia ' ),
(sign: 'G '; cnty: 'Gabon ' ),
(sign: 'GB '; cnty: 'United Kingdom (UK) ' ),
(sign: 'GBA '; cnty: 'Alderney ' ),
(sign: 'GBG '; cnty: 'Guernsey ' ),
(sign: 'GBJ '; cnty: 'Jersey ' ),
(sign: 'GBM '; cnty: 'Isle of Man ' ),
(sign: 'GBZ '; cnty: 'Gibraltar ' ),
(sign: 'GCA '; cnty: 'Guatemala ' ),
(sign: 'GE '; cnty: 'Georgia ' ),
(sign: 'GH '; cnty: 'Ghana ' ),
(sign: 'GQ '; cnty: 'Equatorial Guinea ' ),
(sign: 'GR '; cnty: 'Greece ' ),
(sign: 'GUY '; cnty: 'Guyana ' ),
(sign: 'GW '; cnty: 'Guinea-Bissau ' ),
(sign: 'H '; cnty: 'Hungary ' ),
(sign: 'HK '; cnty: 'Hong Kong ' ),
(sign: 'HKJ '; cnty: 'Jordan ' ),
(sign: 'HN '; cnty: 'Honduras ' ),
(sign: 'HR '; cnty: 'Croatia ' ),
(sign: 'I '; cnty: 'Italy ' ),
(sign: 'IL '; cnty: 'Israel ' ),
(sign: 'IND '; cnty: 'India ' ),
(sign: 'IR '; cnty: 'Iran ' ),
(sign: 'IRL '; cnty: 'Ireland ' ),
(sign: 'IRQ '; cnty: 'Iraq ' ),
(sign: 'IS '; cnty: 'Iceland ' ),
(sign: 'J '; cnty: 'Japan ' ),
(sign: 'JA '; cnty: 'Jamaica ' ),
(sign: 'K '; cnty: 'Cambodia ' ),
(sign: 'KAN '; cnty: 'Saint Kitts and Nevis ' ),
(sign: 'KGZ '; cnty: 'Kyrgyzstan ' ),
(sign: 'KIR '; cnty: 'Kiribati ' ),
(sign: 'KN '; cnty: 'Greenland ' ),
(sign: 'KP '; cnty: 'Democratic People''s Republic of Korea ' ),
(sign: 'KS '; cnty: 'Kyrgyzstan ' ),
(sign: 'KSA '; cnty: 'Saudi Arabia ' ),
(sign: 'KWT '; cnty: 'Kuwait ' ),
(sign: 'KZ '; cnty: 'Kazakhstan ' ),
(sign: 'L '; cnty: 'Luxembourg ' ),
(sign: 'LAO '; cnty: 'Laos ' ),
(sign: 'LAR '; cnty: 'Libya ' ),
(sign: 'LB '; cnty: 'Liberia ' ),
(sign: 'LS '; cnty: 'Lesotho ' ),
(sign: 'LT '; cnty: 'Lithuania ' ),
(sign: 'LV '; cnty: 'Latvia ' ),
(sign: 'M '; cnty: 'Malta ' ),
(sign: 'MA '; cnty: 'Morocco ' ),
(sign: 'MAL '; cnty: 'Malaysia ' ),
(sign: 'MC '; cnty: 'Monaco ' ),
(sign: 'MD '; cnty: 'Moldova ' ),
(sign: 'MEX '; cnty: 'Mexico ' ),
(sign: 'MGL '; cnty: 'Mongolia ' ),
(sign: 'MH '; cnty: 'Marshall Islands ' ),
(sign: 'MK '; cnty: 'Macedonia (NMK) ' ),
(sign: 'MNE '; cnty: 'Montenegro ' ),
(sign: 'MO '; cnty: 'Macau ' ),
(sign: 'MOC '; cnty: 'Mozambique ' ),
(sign: 'MS '; cnty: 'Mauritius ' ),
(sign: 'MV '; cnty: 'Maldives ' ),
(sign: 'MW '; cnty: 'Malawi ' ),
(sign: 'N '; cnty: 'Norway ' ),
(sign: 'NA '; cnty: 'Netherlands Antilles ' ),
(sign: 'NAM '; cnty: 'Namibia ' ),
(sign: 'NAU '; cnty: 'Nauru ' ),
(sign: 'NC '; cnty: 'New Caledonia ' ),
(sign: 'NEP '; cnty: 'Nepal ' ),
(sign: 'NI '; cnty: 'Northern Ireland ' ),
(sign: 'NIC '; cnty: 'Nicaragua ' ),
(sign: 'NL '; cnty: 'Netherlands ' ),
(sign: 'NMK '; cnty: 'North Macedonia ' ),
(sign: 'NZ '; cnty: 'New Zealand ' ),
(sign: 'OM '; cnty: 'Oman ' ),
(sign: 'P '; cnty: 'Portugal ' ),
(sign: 'PA '; cnty: 'Panama ' ),
(sign: 'PAL '; cnty: 'Palau ' ),
(sign: 'PE '; cnty: 'Peru ' ),
(sign: 'PK '; cnty: 'Pakistan ' ),
(sign: 'PL '; cnty: 'Poland ' ),
(sign: 'PMR '; cnty: 'Transnistria ' ),
(sign: 'PNG '; cnty: 'Papua New Guinea ' ),
(sign: 'PR '; cnty: 'Puerto Rico ' ),
(sign: 'PS '; cnty: 'Palestine ' ),
(sign: 'PY '; cnty: 'Paraguay ' ),
(sign: 'Q '; cnty: 'Qatar ' ),
(sign: 'RA '; cnty: 'Argentina ' ),
(sign: 'RB '; cnty: 'Botswana ' ),
(sign: 'RC '; cnty: 'Republic of China (Taiwan) ' ),
(sign: 'RCA '; cnty: 'Central African Republic ' ),
(sign: 'RCB '; cnty: 'Republic of the Congo ' ),
(sign: 'RCH '; cnty: 'Chile ' ),
(sign: 'RG '; cnty: 'Guinea ' ),
(sign: 'RGB '; cnty: 'Guinea-Bissau ' ),
(sign: 'RH '; cnty: 'Haiti ' ),
(sign: 'RI '; cnty: 'Indonesia ' ),
(sign: 'RIM '; cnty: 'Mauritania ' ),
(sign: 'RKS '; cnty: 'Kosovo ' ),
(sign: 'RL '; cnty: 'Lebanon ' ),
(sign: 'RM '; cnty: 'Madagascar ' ),
(sign: 'RMM '; cnty: 'Mali ' ),
(sign: 'RN '; cnty: 'Niger ' ),
(sign: 'RO '; cnty: 'Romania ' ),
(sign: 'ROK '; cnty: 'Republic of Korea ' ),
(sign: 'ROU '; cnty: 'Uruguay (UY) ' ),
(sign: 'RP '; cnty: 'Philippines ' ),
(sign: 'RSM '; cnty: 'San Marino ' ),
(sign: 'RU '; cnty: 'Burundi ' ),
(sign: 'RUS '; cnty: 'Russia ' ),
(sign: 'RWA '; cnty: 'Rwanda ' ),
(sign: 'S '; cnty: 'Sweden ' ),
(sign: 'SCO '; cnty: 'Scotland ' ),
(sign: 'SCV '; cnty: 'Vatican City ' ),
(sign: 'SD '; cnty: 'Swaziland ' ),
(sign: 'SF '; cnty: 'Finland (FIN) ' ),
(sign: 'SGP '; cnty: 'Singapore ' ),
(sign: 'SK '; cnty: 'Slovakia ' ),
(sign: 'SLE '; cnty: 'Sierra Leone ' ),
(sign: 'SLO '; cnty: 'Slovenia ' ),
(sign: 'SME '; cnty: 'Suriname ' ),
(sign: 'SMO '; cnty: 'Sovereign Military Order of Malta ' ),
(sign: 'SN '; cnty: 'Senegal ' ),
(sign: 'SO '; cnty: 'Somalia ' ),
(sign: 'SOL '; cnty: 'Solomon Islands ' ),
(sign: 'SRB '; cnty: 'Serbia ' ),
(sign: 'STP '; cnty: 'S o Tom‚ and Pr¡ncipe ' ),
(sign: 'SUD '; cnty: 'Sudan ' ),
(sign: 'SY '; cnty: 'Seychelles ' ),
(sign: 'SYR '; cnty: 'Syria ' ),
(sign: 'T '; cnty: 'Thailand ' ),
(sign: 'TCH '; cnty: 'Chad ' ),
(sign: 'TD '; cnty: 'Chad ' ),
(sign: 'TG '; cnty: 'Togo ' ),
(sign: 'TJ '; cnty: 'Tajikistan ' ),
(sign: 'TL '; cnty: 'Timor-Leste ' ),
(sign: 'TM '; cnty: 'Turkmenistan ' ),
(sign: 'TN '; cnty: 'Tunisia ' ),
(sign: 'TO '; cnty: 'Tonga ' ),
(sign: 'TR '; cnty: 'Turkey ' ),
(sign: 'TT '; cnty: 'Trinidad and Tobago ' ),
(sign: 'TUV '; cnty: 'Tuvalu ' ),
(sign: 'UA '; cnty: 'Ukraine ' ),
(sign: 'UAE '; cnty: 'United Arab Emirates ' ),
(sign: 'USA '; cnty: 'United States ' ),
(sign: 'UY '; cnty: 'Uruguay ' ),
(sign: 'UZ '; cnty: 'Uzbekistan ' ),
(sign: 'V '; cnty: 'Vatican City ' ),
(sign: 'VN '; cnty: 'Vietnam ' ),
(sign: 'VU '; cnty: 'Vanuatu ' ),
(sign: 'WAG '; cnty: 'Gambia ' ),
(sign: 'WAL '; cnty: 'Sierra Leone ' ),
(sign: 'WAN '; cnty: 'Nigeria ' ),
(sign: 'WD '; cnty: 'Dominica ' ),
(sign: 'WG '; cnty: 'Grenada ' ),
(sign: 'WL '; cnty: 'Saint Lucia ' ),
(sign: 'WS '; cnty: 'Samoa ' ),
(sign: 'WSA '; cnty: 'Western Sahara ' ),
(sign: 'WV '; cnty: 'Saint Vincent and the Grenadines ' ),
(sign: 'YAR '; cnty: 'Yemen ' ),
(sign: 'YU '; cnty: 'Yugoslavia ' ),
(sign: 'YV '; cnty: 'Venezuela ' ),
(sign: 'Z '; cnty: 'Zambia ' ),
(sign: 'ZA '; cnty: 'South Africa ' ),
(sign: 'ZW '; cnty: 'Zimbabwe ' ),

And as you can see, there are multiple entries for four countries,

GB/UK,
MK/NMK,
ROU/UY, and
SF/FIN

which have decided to change their country codes after I had a first ride with a
national of them, the few other users of my program might be using the new
codes, but as long as only one of the two is used,it should not matter, but
returning the same hash for both would be magic. The table should possibly also
have included DDR, SU to name just two countries that no longer exist. Are there
more, don't really know...

Robert
--
Robert AH Prins
robert(a)prino(d)org
The hitchhiking grandfather - https://prino.neocities.org/
Some REXX code for use on z/OS - https://prino.neocities.org/zOS/zOS-Tools.html
Post by Terje Mathisen
Terje
Post by Robert Prins
Post by Terje Mathisen
This one cries out for a hash table!
Yes, but with the 1/2/3 letter codes being the (un)official car licence plate
codes of all 195 independent countries, with some extra odd-man-out ones like
GBJ/GBG, and the oddest, SMA, I've given this a try, but GNU's GPERF doesn't
come up with anything. And yes, I've also read
<http://courses.cs.vt.edu/~cs3114/Fall10/Notes/Supplemental/p17-cichelli.pdf> ;)
Post by Terje Mathisen
Since you have a fixed set of codes, you can spend a little time finding a
fast hash that converts this into a single byte value with zero collisions,
at which point the hash entry is your country code?
I've tried I don't know how many variations, but nothing has come close, and
although it will be slower than hashing, it might well be that a simple lookup
of the first three of four most common countries/nationalities (D/B/LT/GB)
might be a decent compromise, I did use this technique at a Belgian employer
where the three most common nationalities were B, NL, and F, accounting for
more than 90% of the insured. The best I got out of GPERF was a table of 425
entries, but the email from 2016 didn't include the hash function. :( But
being someone who rarely, if ever, deletes something, it might still be around.
Post by Terje Mathisen
Try something like c[0]+c[1]*2+c[2]*4 as a first attempt, then mix in a few
XORs or rotations instead of those SHL only ops.
It should be doable in well under 10 cycles and maybe under 5...
Which would handsomely beat a binary lookup...
Post by Terje Mathisen
If a near-perfect hash is too slow or hard to find, then you could allow a
fixed maximal number of collisions (like max two entries) and scan those, but
that needs room for the original values in the hash table.
I've got the code that Borland used in TP (4?/5?6/7) to do hashing, and that
should be (relatively) easy to adapt.
Robert
Terje Mathisen
2023-02-23 20:59:54 UTC
Permalink
Robert, are you intentionally trying to annoy me? :-)

My own N, Norway is missing!

I'll take a look. In the meantime, have you considered an extremely
simple single-entry cache: Last country used?

I am very confident that it is likely that any given user will tend to
pick up subsequent rides from another car in the same area, right?

If you want to be really fancy then you back that single entry up with 3
or 7 more most recently used, pushing the oldest out when you find a new
country. You resort on each new country
Post by Terje Mathisen
Can you please post the full list? comma, space or newline separated?
(sign: 'A   '; cnty: 'Austria                                 '   ),
(sign: 'AFG '; cnty: 'Afghanistan                             '   ),
(sign: 'AG  '; cnty: 'Antigua and Barbuda                     '   ),
(sign: 'AL  '; cnty: 'Albania                                 '   ),
(sign: 'AM  '; cnty: 'Armenia                                 '   ),
(sign: 'AND '; cnty: 'Andorra                                 '   ),
(sign: 'ANG '; cnty: 'Angola                                  '   ),
(sign: 'ARK '; cnty: 'Antarctica                              '   ),
(sign: 'ARU '; cnty: 'Aruba                                   '   ),
(sign: 'AS  '; cnty: 'Asturias                                '   ),
(sign: 'AUA '; cnty: 'Aruba                                   '   ),
(sign: 'AUS '; cnty: 'Australia                               '   ),
(sign: 'AX  '; cnty: 'Aland Islands                           '   ),
(sign: 'AXA '; cnty: 'Anguilla                                '   ),
(sign: 'AZ  '; cnty: 'Azerbaijan                              '   ),
(sign: 'B   '; cnty: 'Belgium                                 '   ),
(sign: 'BD  '; cnty: 'Bangladesh                              '   ),
(sign: 'BDS '; cnty: 'Barbados                                '   ),
(sign: 'BF  '; cnty: 'Burkina Faso                            '   ),
(sign: 'BG  '; cnty: 'Bulgaria                                '   ),
(sign: 'BH  '; cnty: 'Belize                                  '   ),
(sign: 'BHT '; cnty: 'Bhutan                                  '   ),
(sign: 'BIH '; cnty: 'Bosnia and Herzegovina                  '   ),
(sign: 'BOL '; cnty: 'Bolivia                                 '   ),
(sign: 'BR  '; cnty: 'Brazil                                  '   ),
(sign: 'BRN '; cnty: 'Bahrain                                 '   ),
(sign: 'BRU '; cnty: 'Brunei                                  '   ),
(sign: 'BS  '; cnty: 'Bahamas                                 '   ),
(sign: 'BUR '; cnty: 'Myanmar                                 '   ),
(sign: 'BVI '; cnty: 'British Virgin Islands                  '   ),
(sign: 'BW  '; cnty: 'Botswana                                '   ),
(sign: 'BY  '; cnty: 'Belarus                                 '   ),
(sign: 'BZ  '; cnty: 'Belize                                  '   ),
(sign: 'C   '; cnty: 'Cuba                                    '   ),
(sign: 'CAM '; cnty: 'Cameroon                                '   ),
(sign: 'CDN '; cnty: 'Canada                                  '   ),
(sign: 'CGO '; cnty: 'Democratic Republic of the Congo        '   ),
(sign: 'CH  '; cnty: 'Switzerland                             '   ),
(sign: 'CHN '; cnty: 'People''s Republic of China              '   ),
(sign: 'CI  '; cnty: 'Ivory Coast (Cote d''Ivoire)             '   ),
(sign: 'CL  '; cnty: 'Sri Lanka                               '   ),
(sign: 'CO  '; cnty: 'Colombia                                '   ),
(sign: 'COM '; cnty: 'Comoros                                 '   ),
(sign: 'CR  '; cnty: 'Costa Rica                              '   ),
(sign: 'CV  '; cnty: 'Cape Verde                              '   ),
(sign: 'CY  '; cnty: 'Cyprus                                  '   ),
(sign: 'CYM '; cnty: 'Wales                                   '   ),
(sign: 'CZ  '; cnty: 'Czech Republic                          '   ),
(sign: 'D   '; cnty: 'Germany                                 '   ),
(sign: 'DJI '; cnty: 'Djibouti                                '   ),
(sign: 'DK  '; cnty: 'Denmark                                 '   ),
(sign: 'DOM '; cnty: 'Dominican Republic                      '   ),
(sign: 'DY  '; cnty: 'Benin                                   '   ),
(sign: 'DZ  '; cnty: 'Algeria                                 '   ),
(sign: 'E   '; cnty: 'Spain                                   '   ),
(sign: 'EAK '; cnty: 'Kenya                                   '   ),
(sign: 'EAT '; cnty: 'Tanzania                                '   ),
(sign: 'EAU '; cnty: 'Uganda                                  '   ),
(sign: 'EAZ '; cnty: 'Zanzibar                                '   ),
(sign: 'EC  '; cnty: 'Ecuador                                 '   ),
(sign: 'ENG '; cnty: 'England                                 '   ),
(sign: 'ER  '; cnty: 'Eritrea                                 '   ),
(sign: 'ES  '; cnty: 'El Salvador                             '   ),
(sign: 'EST '; cnty: 'Estonia                                 '   ),
(sign: 'ET  '; cnty: 'Egypt                                   '   ),
(sign: 'ETH '; cnty: 'Ethiopia                                '   ),
(sign: 'F   '; cnty: 'France                                  '   ),
(sign: 'FIN '; cnty: 'Finland                                 '   ),
(sign: 'FJI '; cnty: 'Fiji                                    '   ),
(sign: 'FL  '; cnty: 'Liechtenstein                           '   ),
(sign: 'FO  '; cnty: 'Faroe Islands                           '   ),
(sign: 'FSM '; cnty: 'Federated States of Micronesia          '   ),
(sign: 'G   '; cnty: 'Gabon                                   '   ),
(sign: 'GB  '; cnty: 'United Kingdom (UK)                     '   ),
(sign: 'GBA '; cnty: 'Alderney                                '   ),
(sign: 'GBG '; cnty: 'Guernsey                                '   ),
(sign: 'GBJ '; cnty: 'Jersey                                  '   ),
(sign: 'GBM '; cnty: 'Isle of Man                             '   ),
(sign: 'GBZ '; cnty: 'Gibraltar                               '   ),
(sign: 'GCA '; cnty: 'Guatemala                               '   ),
(sign: 'GE  '; cnty: 'Georgia                                 '   ),
(sign: 'GH  '; cnty: 'Ghana                                   '   ),
(sign: 'GQ  '; cnty: 'Equatorial Guinea                       '   ),
(sign: 'GR  '; cnty: 'Greece                                  '   ),
(sign: 'GUY '; cnty: 'Guyana                                  '   ),
(sign: 'GW  '; cnty: 'Guinea-Bissau                           '   ),
(sign: 'H   '; cnty: 'Hungary                                 '   ),
(sign: 'HK  '; cnty: 'Hong Kong                               '   ),
(sign: 'HKJ '; cnty: 'Jordan                                  '   ),
(sign: 'HN  '; cnty: 'Honduras                                '   ),
(sign: 'HR  '; cnty: 'Croatia                                 '   ),
(sign: 'I   '; cnty: 'Italy                                   '   ),
(sign: 'IL  '; cnty: 'Israel                                  '   ),
(sign: 'IND '; cnty: 'India                                   '   ),
(sign: 'IR  '; cnty: 'Iran                                    '   ),
(sign: 'IRL '; cnty: 'Ireland                                 '   ),
(sign: 'IRQ '; cnty: 'Iraq                                    '   ),
(sign: 'IS  '; cnty: 'Iceland                                 '   ),
(sign: 'J   '; cnty: 'Japan                                   '   ),
(sign: 'JA  '; cnty: 'Jamaica                                 '   ),
(sign: 'K   '; cnty: 'Cambodia                                '   ),
(sign: 'KAN '; cnty: 'Saint Kitts and Nevis                   '   ),
(sign: 'KGZ '; cnty: 'Kyrgyzstan                              '   ),
(sign: 'KIR '; cnty: 'Kiribati                                '   ),
(sign: 'KN  '; cnty: 'Greenland                               '   ),
(sign: 'KP  '; cnty: 'Democratic People''s Republic of Korea   '   ),
(sign: 'KS  '; cnty: 'Kyrgyzstan                              '   ),
(sign: 'KSA '; cnty: 'Saudi Arabia                            '   ),
(sign: 'KWT '; cnty: 'Kuwait                                  '   ),
(sign: 'KZ  '; cnty: 'Kazakhstan                              '   ),
(sign: 'L   '; cnty: 'Luxembourg                              '   ),
(sign: 'LAO '; cnty: 'Laos                                    '   ),
(sign: 'LAR '; cnty: 'Libya                                   '   ),
(sign: 'LB  '; cnty: 'Liberia                                 '   ),
(sign: 'LS  '; cnty: 'Lesotho                                 '   ),
(sign: 'LT  '; cnty: 'Lithuania                               '   ),
(sign: 'LV  '; cnty: 'Latvia                                  '   ),
(sign: 'M   '; cnty: 'Malta                                   '   ),
(sign: 'MA  '; cnty: 'Morocco                                 '   ),
(sign: 'MAL '; cnty: 'Malaysia                                '   ),
(sign: 'MC  '; cnty: 'Monaco                                  '   ),
(sign: 'MD  '; cnty: 'Moldova                                 '   ),
(sign: 'MEX '; cnty: 'Mexico                                  '   ),
(sign: 'MGL '; cnty: 'Mongolia                                '   ),
(sign: 'MH  '; cnty: 'Marshall Islands                        '   ),
(sign: 'MK  '; cnty: 'Macedonia (NMK)                         '   ),
(sign: 'MNE '; cnty: 'Montenegro                              '   ),
(sign: 'MO  '; cnty: 'Macau                                   '   ),
(sign: 'MOC '; cnty: 'Mozambique                              '   ),
(sign: 'MS  '; cnty: 'Mauritius                               '   ),
(sign: 'MV  '; cnty: 'Maldives                                '   ),
(sign: 'MW  '; cnty: 'Malawi                                  '   ),
(sign: 'N   '; cnty: 'Norway                                  '   ),
(sign: 'NA  '; cnty: 'Netherlands Antilles                    '   ),
(sign: 'NAM '; cnty: 'Namibia                                 '   ),
(sign: 'NAU '; cnty: 'Nauru                                   '   ),
(sign: 'NC  '; cnty: 'New Caledonia                           '   ),
(sign: 'NEP '; cnty: 'Nepal                                   '   ),
(sign: 'NI  '; cnty: 'Northern Ireland                        '   ),
(sign: 'NIC '; cnty: 'Nicaragua                               '   ),
(sign: 'NL  '; cnty: 'Netherlands                             '   ),
(sign: 'NMK '; cnty: 'North Macedonia                         '   ),
(sign: 'NZ  '; cnty: 'New Zealand                             '   ),
(sign: 'OM  '; cnty: 'Oman                                    '   ),
(sign: 'P   '; cnty: 'Portugal                                '   ),
(sign: 'PA  '; cnty: 'Panama                                  '   ),
(sign: 'PAL '; cnty: 'Palau                                   '   ),
(sign: 'PE  '; cnty: 'Peru                                    '   ),
(sign: 'PK  '; cnty: 'Pakistan                                '   ),
(sign: 'PL  '; cnty: 'Poland                                  '   ),
(sign: 'PMR '; cnty: 'Transnistria                            '   ),
(sign: 'PNG '; cnty: 'Papua New Guinea                        '   ),
(sign: 'PR  '; cnty: 'Puerto Rico                             '   ),
(sign: 'PS  '; cnty: 'Palestine                               '   ),
(sign: 'PY  '; cnty: 'Paraguay                                '   ),
(sign: 'Q   '; cnty: 'Qatar                                   '   ),
(sign: 'RA  '; cnty: 'Argentina                               '   ),
(sign: 'RB  '; cnty: 'Botswana                                '   ),
(sign: 'RC  '; cnty: 'Republic of China (Taiwan)              '   ),
(sign: 'RCA '; cnty: 'Central African Republic                '   ),
(sign: 'RCB '; cnty: 'Republic of the Congo                   '   ),
(sign: 'RCH '; cnty: 'Chile                                   '   ),
(sign: 'RG  '; cnty: 'Guinea                                  '   ),
(sign: 'RGB '; cnty: 'Guinea-Bissau                           '   ),
(sign: 'RH  '; cnty: 'Haiti                                   '   ),
(sign: 'RI  '; cnty: 'Indonesia                               '   ),
(sign: 'RIM '; cnty: 'Mauritania                              '   ),
(sign: 'RKS '; cnty: 'Kosovo                                  '   ),
(sign: 'RL  '; cnty: 'Lebanon                                 '   ),
(sign: 'RM  '; cnty: 'Madagascar                              '   ),
(sign: 'RMM '; cnty: 'Mali                                    '   ),
(sign: 'RN  '; cnty: 'Niger                                   '   ),
(sign: 'RO  '; cnty: 'Romania                                 '   ),
(sign: 'ROK '; cnty: 'Republic of Korea                       '   ),
(sign: 'ROU '; cnty: 'Uruguay (UY)                            '   ),
(sign: 'RP  '; cnty: 'Philippines                             '   ),
(sign: 'RSM '; cnty: 'San Marino                              '   ),
(sign: 'RU  '; cnty: 'Burundi                                 '   ),
(sign: 'RUS '; cnty: 'Russia                                  '   ),
(sign: 'RWA '; cnty: 'Rwanda                                  '   ),
(sign: 'S   '; cnty: 'Sweden                                  '   ),
(sign: 'SCO '; cnty: 'Scotland                                '   ),
(sign: 'SCV '; cnty: 'Vatican City                            '   ),
(sign: 'SD  '; cnty: 'Swaziland                               '   ),
(sign: 'SF  '; cnty: 'Finland (FIN)                           '   ),
(sign: 'SGP '; cnty: 'Singapore                               '   ),
(sign: 'SK  '; cnty: 'Slovakia                                '   ),
(sign: 'SLE '; cnty: 'Sierra Leone                            '   ),
(sign: 'SLO '; cnty: 'Slovenia                                '   ),
(sign: 'SME '; cnty: 'Suriname                                '   ),
(sign: 'SMO '; cnty: 'Sovereign Military Order of Malta       '   ),
(sign: 'SN  '; cnty: 'Senegal                                 '   ),
(sign: 'SO  '; cnty: 'Somalia                                 '   ),
(sign: 'SOL '; cnty: 'Solomon Islands                         '   ),
(sign: 'SRB '; cnty: 'Serbia                                  '   ),
(sign: 'STP '; cnty: 'S o Tom‚ and Pr¡ncipe                   '   ),
(sign: 'SUD '; cnty: 'Sudan                                   '   ),
(sign: 'SY  '; cnty: 'Seychelles                              '   ),
(sign: 'SYR '; cnty: 'Syria                                   '   ),
(sign: 'T   '; cnty: 'Thailand                                '   ),
(sign: 'TCH '; cnty: 'Chad                                    '   ),
(sign: 'TD  '; cnty: 'Chad                                    '   ),
(sign: 'TG  '; cnty: 'Togo                                    '   ),
(sign: 'TJ  '; cnty: 'Tajikistan                              '   ),
(sign: 'TL  '; cnty: 'Timor-Leste                             '   ),
(sign: 'TM  '; cnty: 'Turkmenistan                            '   ),
(sign: 'TN  '; cnty: 'Tunisia                                 '   ),
(sign: 'TO  '; cnty: 'Tonga                                   '   ),
(sign: 'TR  '; cnty: 'Turkey                                  '   ),
(sign: 'TT  '; cnty: 'Trinidad and Tobago                     '   ),
(sign: 'TUV '; cnty: 'Tuvalu                                  '   ),
(sign: 'UA  '; cnty: 'Ukraine                                 '   ),
(sign: 'UAE '; cnty: 'United Arab Emirates                    '   ),
(sign: 'USA '; cnty: 'United States                           '   ),
(sign: 'UY  '; cnty: 'Uruguay                                 '   ),
(sign: 'UZ  '; cnty: 'Uzbekistan                              '   ),
(sign: 'V   '; cnty: 'Vatican City                            '   ),
(sign: 'VN  '; cnty: 'Vietnam                                 '   ),
(sign: 'VU  '; cnty: 'Vanuatu                                 '   ),
(sign: 'WAG '; cnty: 'Gambia                                  '   ),
(sign: 'WAL '; cnty: 'Sierra Leone                            '   ),
(sign: 'WAN '; cnty: 'Nigeria                                 '   ),
(sign: 'WD  '; cnty: 'Dominica                                '   ),
(sign: 'WG  '; cnty: 'Grenada                                 '   ),
(sign: 'WL  '; cnty: 'Saint Lucia                             '   ),
(sign: 'WS  '; cnty: 'Samoa                                   '   ),
(sign: 'WSA '; cnty: 'Western Sahara                          '   ),
(sign: 'WV  '; cnty: 'Saint Vincent and the Grenadines        '   ),
(sign: 'YAR '; cnty: 'Yemen                                   '   ),
(sign: 'YU  '; cnty: 'Yugoslavia                              '   ),
(sign: 'YV  '; cnty: 'Venezuela                               '   ),
(sign: 'Z   '; cnty: 'Zambia                                  '   ),
(sign: 'ZA  '; cnty: 'South Africa                            '   ),
(sign: 'ZW  '; cnty: 'Zimbabwe                                '   ),
And as you can see, there are multiple entries for four countries,
GB/UK,
MK/NMK,
ROU/UY, and
SF/FIN
which have decided to change their country codes after I had a first
ride with a national of them, the few other users of my program might be
using the new codes, but as long as only one of the two is used,it
should not matter, but returning the same hash for both would be magic.
The table should possibly also have included DDR, SU to name just two
countries that no longer exist. Are there more, don't really know...
Robert
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
Robert Prins
2023-02-24 02:04:14 UTC
Permalink
Post by Terje Mathisen
Robert, are you intentionally trying to annoy me? :-)
My own N, Norway is missing!
Please look again! My program would crash, in 1982 I hitched all the way up to
Nordkapp, that wonderful place where people only go to to wrongly tell their
friends that they've been at the northernmost point of Europe. ;)
Post by Terje Mathisen
I'll take a look. In the meantime, have you considered an extremely simple
single-entry cache: Last country used?
That might very well be the best solution
Post by Terje Mathisen
I am very confident that it is likely that any given user will tend to pick up
subsequent rides from another car in the same area, right?
That's certainly true
Post by Terje Mathisen
If you want to be really fancy then you back that single entry up with 3 or 7
more most recently used, pushing the oldest out when you find a new country. You
resort on each new country
That's what I used in 1996 while working for KLM while building a prototype
payload prediction system (in PL/I). There I didn't sort, but just added a
counter and threw out the item with the lowest value, and moved the new value
into its location, and, obviously, in location 0 of the array.

In this case the extra overhead might be more than doing the binary lookup?
Probably the only way to find out would be to do something in pure DOS using the
full set of data that's now going into the binary lookup, an extra "writeln()"
should produce it, or I could use the "COUNT" option of the old z/OS PL/I
compiler to give me the, what else, counts of executed PL/I statements, which
would give at least a rough idea as to what's the way to go.

I'll give it a try, once I've managed to figure out what I'm doing wrong with
this mainframe PL/I version. Give me some time, family things are right now
sadly taking up rather a lot of time, and don't spoil me with a, if there's one,
ready made solution!

Robert
--
Robert AH Prins
robert(a)prino(d)org
The hitchhiking grandfather - https://prino.neocities.org/
Some REXX code for use on z/OS - https://prino.neocities.org/zOS/zOS-Tools.html
Post by Terje Mathisen
Post by Terje Mathisen
Can you please post the full list? comma, space or newline separated?
(sign: 'A   '; cnty: 'Austria                                 '   ),
(sign: 'AFG '; cnty: 'Afghanistan                             '   ),
(sign: 'AG  '; cnty: 'Antigua and Barbuda                     '   ),
(sign: 'AL  '; cnty: 'Albania                                 '   ),
(sign: 'AM  '; cnty: 'Armenia                                 '   ),
(sign: 'AND '; cnty: 'Andorra                                 '   ),
(sign: 'ANG '; cnty: 'Angola                                  '   ),
(sign: 'ARK '; cnty: 'Antarctica                              '   ),
(sign: 'ARU '; cnty: 'Aruba                                   '   ),
(sign: 'AS  '; cnty: 'Asturias                                '   ),
(sign: 'AUA '; cnty: 'Aruba                                   '   ),
(sign: 'AUS '; cnty: 'Australia                               '   ),
(sign: 'AX  '; cnty: 'Aland Islands                           '   ),
(sign: 'AXA '; cnty: 'Anguilla                                '   ),
(sign: 'AZ  '; cnty: 'Azerbaijan                              '   ),
(sign: 'B   '; cnty: 'Belgium                                 '   ),
(sign: 'BD  '; cnty: 'Bangladesh                              '   ),
(sign: 'BDS '; cnty: 'Barbados                                '   ),
(sign: 'BF  '; cnty: 'Burkina Faso                            '   ),
(sign: 'BG  '; cnty: 'Bulgaria                                '   ),
(sign: 'BH  '; cnty: 'Belize                                  '   ),
(sign: 'BHT '; cnty: 'Bhutan                                  '   ),
(sign: 'BIH '; cnty: 'Bosnia and Herzegovina                  '   ),
(sign: 'BOL '; cnty: 'Bolivia                                 '   ),
(sign: 'BR  '; cnty: 'Brazil                                  '   ),
(sign: 'BRN '; cnty: 'Bahrain                                 '   ),
(sign: 'BRU '; cnty: 'Brunei                                  '   ),
(sign: 'BS  '; cnty: 'Bahamas                                 '   ),
(sign: 'BUR '; cnty: 'Myanmar                                 '   ),
(sign: 'BVI '; cnty: 'British Virgin Islands                  '   ),
(sign: 'BW  '; cnty: 'Botswana                                '   ),
(sign: 'BY  '; cnty: 'Belarus                                 '   ),
(sign: 'BZ  '; cnty: 'Belize                                  '   ),
(sign: 'C   '; cnty: 'Cuba                                    '   ),
(sign: 'CAM '; cnty: 'Cameroon                                '   ),
(sign: 'CDN '; cnty: 'Canada                                  '   ),
(sign: 'CGO '; cnty: 'Democratic Republic of the Congo        '   ),
(sign: 'CH  '; cnty: 'Switzerland                             '   ),
(sign: 'CHN '; cnty: 'People''s Republic of China              '   ),
(sign: 'CI  '; cnty: 'Ivory Coast (Cote d''Ivoire)             '   ),
(sign: 'CL  '; cnty: 'Sri Lanka                               '   ),
(sign: 'CO  '; cnty: 'Colombia                                '   ),
(sign: 'COM '; cnty: 'Comoros                                 '   ),
(sign: 'CR  '; cnty: 'Costa Rica                              '   ),
(sign: 'CV  '; cnty: 'Cape Verde                              '   ),
(sign: 'CY  '; cnty: 'Cyprus                                  '   ),
(sign: 'CYM '; cnty: 'Wales                                   '   ),
(sign: 'CZ  '; cnty: 'Czech Republic                          '   ),
(sign: 'D   '; cnty: 'Germany                                 '   ),
(sign: 'DJI '; cnty: 'Djibouti                                '   ),
(sign: 'DK  '; cnty: 'Denmark                                 '   ),
(sign: 'DOM '; cnty: 'Dominican Republic                      '   ),
(sign: 'DY  '; cnty: 'Benin                                   '   ),
(sign: 'DZ  '; cnty: 'Algeria                                 '   ),
(sign: 'E   '; cnty: 'Spain                                   '   ),
(sign: 'EAK '; cnty: 'Kenya                                   '   ),
(sign: 'EAT '; cnty: 'Tanzania                                '   ),
(sign: 'EAU '; cnty: 'Uganda                                  '   ),
(sign: 'EAZ '; cnty: 'Zanzibar                                '   ),
(sign: 'EC  '; cnty: 'Ecuador                                 '   ),
(sign: 'ENG '; cnty: 'England                                 '   ),
(sign: 'ER  '; cnty: 'Eritrea                                 '   ),
(sign: 'ES  '; cnty: 'El Salvador                             '   ),
(sign: 'EST '; cnty: 'Estonia                                 '   ),
(sign: 'ET  '; cnty: 'Egypt                                   '   ),
(sign: 'ETH '; cnty: 'Ethiopia                                '   ),
(sign: 'F   '; cnty: 'France                                  '   ),
(sign: 'FIN '; cnty: 'Finland                                 '   ),
(sign: 'FJI '; cnty: 'Fiji                                    '   ),
(sign: 'FL  '; cnty: 'Liechtenstein                           '   ),
(sign: 'FO  '; cnty: 'Faroe Islands                           '   ),
(sign: 'FSM '; cnty: 'Federated States of Micronesia          '   ),
(sign: 'G   '; cnty: 'Gabon                                   '   ),
(sign: 'GB  '; cnty: 'United Kingdom (UK)                     '   ),
(sign: 'GBA '; cnty: 'Alderney                                '   ),
(sign: 'GBG '; cnty: 'Guernsey                                '   ),
(sign: 'GBJ '; cnty: 'Jersey                                  '   ),
(sign: 'GBM '; cnty: 'Isle of Man                             '   ),
(sign: 'GBZ '; cnty: 'Gibraltar                               '   ),
(sign: 'GCA '; cnty: 'Guatemala                               '   ),
(sign: 'GE  '; cnty: 'Georgia                                 '   ),
(sign: 'GH  '; cnty: 'Ghana                                   '   ),
(sign: 'GQ  '; cnty: 'Equatorial Guinea                       '   ),
(sign: 'GR  '; cnty: 'Greece                                  '   ),
(sign: 'GUY '; cnty: 'Guyana                                  '   ),
(sign: 'GW  '; cnty: 'Guinea-Bissau                           '   ),
(sign: 'H   '; cnty: 'Hungary                                 '   ),
(sign: 'HK  '; cnty: 'Hong Kong                               '   ),
(sign: 'HKJ '; cnty: 'Jordan                                  '   ),
(sign: 'HN  '; cnty: 'Honduras                                '   ),
(sign: 'HR  '; cnty: 'Croatia                                 '   ),
(sign: 'I   '; cnty: 'Italy                                   '   ),
(sign: 'IL  '; cnty: 'Israel                                  '   ),
(sign: 'IND '; cnty: 'India                                   '   ),
(sign: 'IR  '; cnty: 'Iran                                    '   ),
(sign: 'IRL '; cnty: 'Ireland                                 '   ),
(sign: 'IRQ '; cnty: 'Iraq                                    '   ),
(sign: 'IS  '; cnty: 'Iceland                                 '   ),
(sign: 'J   '; cnty: 'Japan                                   '   ),
(sign: 'JA  '; cnty: 'Jamaica                                 '   ),
(sign: 'K   '; cnty: 'Cambodia                                '   ),
(sign: 'KAN '; cnty: 'Saint Kitts and Nevis                   '   ),
(sign: 'KGZ '; cnty: 'Kyrgyzstan                              '   ),
(sign: 'KIR '; cnty: 'Kiribati                                '   ),
(sign: 'KN  '; cnty: 'Greenland                               '   ),
(sign: 'KP  '; cnty: 'Democratic People''s Republic of Korea   '   ),
(sign: 'KS  '; cnty: 'Kyrgyzstan                              '   ),
(sign: 'KSA '; cnty: 'Saudi Arabia                            '   ),
(sign: 'KWT '; cnty: 'Kuwait                                  '   ),
(sign: 'KZ  '; cnty: 'Kazakhstan                              '   ),
(sign: 'L   '; cnty: 'Luxembourg                              '   ),
(sign: 'LAO '; cnty: 'Laos                                    '   ),
(sign: 'LAR '; cnty: 'Libya                                   '   ),
(sign: 'LB  '; cnty: 'Liberia                                 '   ),
(sign: 'LS  '; cnty: 'Lesotho                                 '   ),
(sign: 'LT  '; cnty: 'Lithuania                               '   ),
(sign: 'LV  '; cnty: 'Latvia                                  '   ),
(sign: 'M   '; cnty: 'Malta                                   '   ),
(sign: 'MA  '; cnty: 'Morocco                                 '   ),
(sign: 'MAL '; cnty: 'Malaysia                                '   ),
(sign: 'MC  '; cnty: 'Monaco                                  '   ),
(sign: 'MD  '; cnty: 'Moldova                                 '   ),
(sign: 'MEX '; cnty: 'Mexico                                  '   ),
(sign: 'MGL '; cnty: 'Mongolia                                '   ),
(sign: 'MH  '; cnty: 'Marshall Islands                        '   ),
(sign: 'MK  '; cnty: 'Macedonia (NMK)                         '   ),
(sign: 'MNE '; cnty: 'Montenegro                              '   ),
(sign: 'MO  '; cnty: 'Macau                                   '   ),
(sign: 'MOC '; cnty: 'Mozambique                              '   ),
(sign: 'MS  '; cnty: 'Mauritius                               '   ),
(sign: 'MV  '; cnty: 'Maldives                                '   ),
(sign: 'MW  '; cnty: 'Malawi                                  '   ),
(sign: 'N   '; cnty: 'Norway                                  '   ),
(sign: 'NA  '; cnty: 'Netherlands Antilles                    '   ),
(sign: 'NAM '; cnty: 'Namibia                                 '   ),
(sign: 'NAU '; cnty: 'Nauru                                   '   ),
(sign: 'NC  '; cnty: 'New Caledonia                           '   ),
(sign: 'NEP '; cnty: 'Nepal                                   '   ),
(sign: 'NI  '; cnty: 'Northern Ireland                        '   ),
(sign: 'NIC '; cnty: 'Nicaragua                               '   ),
(sign: 'NL  '; cnty: 'Netherlands                             '   ),
(sign: 'NMK '; cnty: 'North Macedonia                         '   ),
(sign: 'NZ  '; cnty: 'New Zealand                             '   ),
(sign: 'OM  '; cnty: 'Oman                                    '   ),
(sign: 'P   '; cnty: 'Portugal                                '   ),
(sign: 'PA  '; cnty: 'Panama                                  '   ),
(sign: 'PAL '; cnty: 'Palau                                   '   ),
(sign: 'PE  '; cnty: 'Peru                                    '   ),
(sign: 'PK  '; cnty: 'Pakistan                                '   ),
(sign: 'PL  '; cnty: 'Poland                                  '   ),
(sign: 'PMR '; cnty: 'Transnistria                            '   ),
(sign: 'PNG '; cnty: 'Papua New Guinea                        '   ),
(sign: 'PR  '; cnty: 'Puerto Rico                             '   ),
(sign: 'PS  '; cnty: 'Palestine                               '   ),
(sign: 'PY  '; cnty: 'Paraguay                                '   ),
(sign: 'Q   '; cnty: 'Qatar                                   '   ),
(sign: 'RA  '; cnty: 'Argentina                               '   ),
(sign: 'RB  '; cnty: 'Botswana                                '   ),
(sign: 'RC  '; cnty: 'Republic of China (Taiwan)              '   ),
(sign: 'RCA '; cnty: 'Central African Republic                '   ),
(sign: 'RCB '; cnty: 'Republic of the Congo                   '   ),
(sign: 'RCH '; cnty: 'Chile                                   '   ),
(sign: 'RG  '; cnty: 'Guinea                                  '   ),
(sign: 'RGB '; cnty: 'Guinea-Bissau                           '   ),
(sign: 'RH  '; cnty: 'Haiti                                   '   ),
(sign: 'RI  '; cnty: 'Indonesia                               '   ),
(sign: 'RIM '; cnty: 'Mauritania                              '   ),
(sign: 'RKS '; cnty: 'Kosovo                                  '   ),
(sign: 'RL  '; cnty: 'Lebanon                                 '   ),
(sign: 'RM  '; cnty: 'Madagascar                              '   ),
(sign: 'RMM '; cnty: 'Mali                                    '   ),
(sign: 'RN  '; cnty: 'Niger                                   '   ),
(sign: 'RO  '; cnty: 'Romania                                 '   ),
(sign: 'ROK '; cnty: 'Republic of Korea                       '   ),
(sign: 'ROU '; cnty: 'Uruguay (UY)                            '   ),
(sign: 'RP  '; cnty: 'Philippines                             '   ),
(sign: 'RSM '; cnty: 'San Marino                              '   ),
(sign: 'RU  '; cnty: 'Burundi                                 '   ),
(sign: 'RUS '; cnty: 'Russia                                  '   ),
(sign: 'RWA '; cnty: 'Rwanda                                  '   ),
(sign: 'S   '; cnty: 'Sweden                                  '   ),
(sign: 'SCO '; cnty: 'Scotland                                '   ),
(sign: 'SCV '; cnty: 'Vatican City                            '   ),
(sign: 'SD  '; cnty: 'Swaziland                               '   ),
(sign: 'SF  '; cnty: 'Finland (FIN)                           '   ),
(sign: 'SGP '; cnty: 'Singapore                               '   ),
(sign: 'SK  '; cnty: 'Slovakia                                '   ),
(sign: 'SLE '; cnty: 'Sierra Leone                            '   ),
(sign: 'SLO '; cnty: 'Slovenia                                '   ),
(sign: 'SME '; cnty: 'Suriname                                '   ),
(sign: 'SMO '; cnty: 'Sovereign Military Order of Malta       '   ),
(sign: 'SN  '; cnty: 'Senegal                                 '   ),
(sign: 'SO  '; cnty: 'Somalia                                 '   ),
(sign: 'SOL '; cnty: 'Solomon Islands                         '   ),
(sign: 'SRB '; cnty: 'Serbia                                  '   ),
(sign: 'STP '; cnty: 'S o Tom‚ and Pr¡ncipe                   '   ),
(sign: 'SUD '; cnty: 'Sudan                                   '   ),
(sign: 'SY  '; cnty: 'Seychelles                              '   ),
(sign: 'SYR '; cnty: 'Syria                                   '   ),
(sign: 'T   '; cnty: 'Thailand                                '   ),
(sign: 'TCH '; cnty: 'Chad                                    '   ),
(sign: 'TD  '; cnty: 'Chad                                    '   ),
(sign: 'TG  '; cnty: 'Togo                                    '   ),
(sign: 'TJ  '; cnty: 'Tajikistan                              '   ),
(sign: 'TL  '; cnty: 'Timor-Leste                             '   ),
(sign: 'TM  '; cnty: 'Turkmenistan                            '   ),
(sign: 'TN  '; cnty: 'Tunisia                                 '   ),
(sign: 'TO  '; cnty: 'Tonga                                   '   ),
(sign: 'TR  '; cnty: 'Turkey                                  '   ),
(sign: 'TT  '; cnty: 'Trinidad and Tobago                     '   ),
(sign: 'TUV '; cnty: 'Tuvalu                                  '   ),
(sign: 'UA  '; cnty: 'Ukraine                                 '   ),
(sign: 'UAE '; cnty: 'United Arab Emirates                    '   ),
(sign: 'USA '; cnty: 'United States                           '   ),
(sign: 'UY  '; cnty: 'Uruguay                                 '   ),
(sign: 'UZ  '; cnty: 'Uzbekistan                              '   ),
(sign: 'V   '; cnty: 'Vatican City                            '   ),
(sign: 'VN  '; cnty: 'Vietnam                                 '   ),
(sign: 'VU  '; cnty: 'Vanuatu                                 '   ),
(sign: 'WAG '; cnty: 'Gambia                                  '   ),
(sign: 'WAL '; cnty: 'Sierra Leone                            '   ),
(sign: 'WAN '; cnty: 'Nigeria                                 '   ),
(sign: 'WD  '; cnty: 'Dominica                                '   ),
(sign: 'WG  '; cnty: 'Grenada                                 '   ),
(sign: 'WL  '; cnty: 'Saint Lucia                             '   ),
(sign: 'WS  '; cnty: 'Samoa                                   '   ),
(sign: 'WSA '; cnty: 'Western Sahara                          '   ),
(sign: 'WV  '; cnty: 'Saint Vincent and the Grenadines        '   ),
(sign: 'YAR '; cnty: 'Yemen                                   '   ),
(sign: 'YU  '; cnty: 'Yugoslavia                              '   ),
(sign: 'YV  '; cnty: 'Venezuela                               '   ),
(sign: 'Z   '; cnty: 'Zambia                                  '   ),
(sign: 'ZA  '; cnty: 'South Africa                            '   ),
(sign: 'ZW  '; cnty: 'Zimbabwe                                '   ),
And as you can see, there are multiple entries for four countries,
GB/UK,
MK/NMK,
ROU/UY, and
SF/FIN
which have decided to change their country codes after I had a first ride with
a national of them, the few other users of my program might be using the new
codes, but as long as only one of the two is used,it should not matter, but
returning the same hash for both would be magic. The table should possibly
also have included DDR, SU to name just two countries that no longer exist.
Are there more, don't really know...
Robert
Terje Mathisen
2023-02-25 09:33:26 UTC
Permalink
Post by Robert Prins
Post by Terje Mathisen
Robert, are you intentionally trying to annoy me? :-)
My own N, Norway is missing!
Please look again! My program would crash, in 1982 I hitched all the way
up to Nordkapp, that wonderful place where people only go to to wrongly
tell their friends that they've been at the northernmost point of
Europe. ;)
Yeah, by my guess at least 99.99+% of all visitors to Nordkapp think
they have been to said northernmost point in Europe:

If you want "continental Europe", then the old ferry, now undersea
tunnel to the Magerøya island would disqualify Nordkapp. If you otoh
allow this, like I would do as well since there is a small city on the
island, then your map should show you the 3-5 hour hike needed to get
from the main road to the real tip of the island:

https://mapant.no/?#7/71.184/25.674 Knivskjærodden

You could also make a very good case for

https://mapant.no/?#8/71.130/27.648 Kinnarodden which is on the actual
mainland, just a bit further east and slightly longer south.

The difference of 0.054 degrees corresponds to about 6 km.

Terje
Post by Robert Prins
Post by Terje Mathisen
I'll take a look. In the meantime, have you considered an extremely
simple single-entry cache: Last country used?
That might very well be the best solution
Post by Terje Mathisen
I am very confident that it is likely that any given user will tend to
pick up subsequent rides from another car in the same area, right?
That's certainly true
Post by Terje Mathisen
If you want to be really fancy then you back that single entry up with
3 or 7 more most recently used, pushing the oldest out when you find a
new country. You resort on each new country
That's what I used in 1996 while working for KLM while building a
prototype payload prediction system (in PL/I). There I didn't sort, but
just added a counter and threw out the item with the lowest value, and
moved the new value into its location, and, obviously, in location 0 of
the array.
In this case the extra overhead might be more than doing the binary
lookup? Probably the only way to find out would be to do something in
pure DOS using the full set of data that's now going into the binary
lookup, an extra "writeln()" should produce it, or I could use the
"COUNT" option of the old z/OS PL/I compiler to give me the, what else,
counts of executed PL/I statements, which would give at least a rough
idea as to what's the way to go.
I'll give it a try, once I've managed to figure out what I'm doing wrong
with this mainframe PL/I version. Give me some time, family things are
right now sadly taking up rather a lot of time, and don't spoil me with
a, if there's one, ready made solution!
Robert
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
Robert Prins
2023-02-25 13:21:26 UTC
Permalink
Post by Robert Prins
Post by Terje Mathisen
Robert, are you intentionally trying to annoy me? :-)
My own N, Norway is missing!
Please look again! My program would crash, in 1982 I hitched all the way up to
Nordkapp, that wonderful place where people only go to to wrongly tell their
friends that they've been at the northernmost point of Europe. ;)
Yeah, by my guess at least 99.99+% of all visitors to Nordkapp think they have
If you want "continental Europe", then the old ferry, now undersea tunnel to the
Magerøya island would disqualify Nordkapp. If you otoh allow this, like I would
do as well since there is a small city on the island, then your map should show
you the 3-5 hour hike needed to get from the main road to the real tip of the
https://mapant.no/?#7/71.184/25.674  Knivskjærodden
I know, although didn't in 1982, but that year hiking would have been all but
impossible, as the place was still covered in a thick layer of snow, even though
I got there on 20 June and left on 21 June, without having seen the midsummer
sun, as the place was also covered in dense fog. :(
You could also make a very good case for
https://mapant.no/?#8/71.130/27.648 Kinnarodden which is on the actual mainland,
just a bit further east and slightly longer south.
The difference of 0.054 degrees corresponds to about 6 km.
And after this touristic break, let's return to the lookup ;)

Here's what comes out of the current PL/I version, where I actually cache the
last country/nationality, and still get a 90% cache-miss, ouch.

/********************************************************
* SRCH_CNTY: *
* *
* Find the name of a country, given the license plate *
********************************************************/
SRCH_CNTY: PROC(PLATE) RETURNS(FIXED BIN (31)); 3,885
DCL PLATE CHAR (4); 3,885

DCL #L FIXED BIN (31); 3,885
DCL #H FIXED BIN (31); 3,885
DCL #M FIXED BIN (31); 3,885
DCL #P FIXED BIN (31) INIT (1) STATIC; 3,885

/*+-----------------------------------------------------+
| Start of executable code for srch_cnty |
+----------------------------------------------------*/
IF CPLATES(#P).PLATE = PLATE THEN 3,885
DO;
RETURN(#P); 395
END; 0
ELSE 3,490
DO;
#L = LBOUND(CPLATES.PLATE, 1); 3,490
#H = HBOUND(CPLATES.PLATE, 1); 3,490

DO WHILE(#L <= #H); 24,726
#M = (#L + #H) / 2; 24,726

SELECT; 24,726
WHEN (CPLATES(#M).PLATE > PLATE) #H = #M - 1 24,726
WHEN (CPLATES(#M).PLATE < PLATE) #L = #M + 1 13,242
OTHER 3,490
DO;
#P = #M; 3,490
RETURN(#M); 3,490
END; 0
END; 21,236
END; 21,236
END; 0

PUT SKIP LIST('Erroneous plate: <' || PLATE || '>'); 0
SIGNAL ERROR; 0
END SRCH_CNTY; 0

Swapping the two "WHEN" statements makes a bit of difference, the total of
executed statements for the procedure goes up from 203148 to 204880, I guess
that in a grey past I already looked at all "select-when-other" statements to
put them in the (then) best order, it's probably something one would have to do
every now and then, which of course rarely happens - the current versions of the
IBM PL/I compiler sadly no longer have an option to count executed statements,
even if these counts should obviously only serve as a very rough guide to what's
going on, as the above 21,236 times "executed" statement that closes the SELECT
doesn't actually generate any code.

Anyway, I'll double(!) the cache, and see what difference that makes. ;)

Robert
--
Robert AH Prins
robert(a)prino(d)org
The hitchhiking grandfather - https://prino.neocities.org/
Some REXX code for use on z/OS - https://prino.neocities.org/zOS/zOS-Tools.html
Post by Robert Prins
Post by Terje Mathisen
I'll take a look. In the meantime, have you considered an extremely simple
single-entry cache: Last country used?
That might very well be the best solution
Post by Terje Mathisen
I am very confident that it is likely that any given user will tend to pick
up subsequent rides from another car in the same area, right?
That's certainly true
Post by Terje Mathisen
If you want to be really fancy then you back that single entry up with 3 or 7
more most recently used, pushing the oldest out when you find a new country.
You resort on each new country
That's what I used in 1996 while working for KLM while building a prototype
payload prediction system (in PL/I). There I didn't sort, but just added a
counter and threw out the item with the lowest value, and moved the new value
into its location, and, obviously, in location 0 of the array.
In this case the extra overhead might be more than doing the binary lookup?
Probably the only way to find out would be to do something in pure DOS using
the full set of data that's now going into the binary lookup, an extra
"writeln()" should produce it, or I could use the "COUNT" option of the old
z/OS PL/I compiler to give me the, what else, counts of executed PL/I
statements, which would give at least a rough idea as to what's the way to go.
I'll give it a try, once I've managed to figure out what I'm doing wrong with
this mainframe PL/I version. Give me some time, family things are right now
sadly taking up rather a lot of time, and don't spoil me with a, if there's
one, ready made solution!
Robert
Robert Prins
2023-02-27 14:10:56 UTC
Permalink
Post by Robert Prins
Post by Robert Prins
Post by Terje Mathisen
Robert, are you intentionally trying to annoy me? :-)
My own N, Norway is missing!
Please look again! My program would crash, in 1982 I hitched all the way up
to Nordkapp, that wonderful place where people only go to to wrongly tell
their friends that they've been at the northernmost point of Europe. ;)
Yeah, by my guess at least 99.99+% of all visitors to Nordkapp think they have
If you want "continental Europe", then the old ferry, now undersea tunnel to
the Magerøya island would disqualify Nordkapp. If you otoh allow this, like I
would do as well since there is a small city on the island, then your map
should show you the 3-5 hour hike needed to get from the main road to the real
https://mapant.no/?#7/71.184/25.674  Knivskjærodden
I know, although didn't in 1982, but that year hiking would have been all but
impossible, as the place was still covered in a thick layer of snow, even though
I got there on 20 June and left on 21 June, without having seen the midsummer
sun, as the place was also covered in dense fog. :(
You could also make a very good case for
https://mapant.no/?#8/71.130/27.648 Kinnarodden which is on the actual
mainland, just a bit further east and slightly longer south.
The difference of 0.054 degrees corresponds to about 6 km.
And after this touristic break, let's return to the lookup ;)
Here's what comes out of the current PL/I version, where I actually cache the
last country/nationality, and still get a 90% cache-miss, ouch.
<snip PL/I>
Post by Robert Prins
Anyway, I'll double(!) the cache, and see what difference that makes. ;)
Not surprisingly, that doubles the hits, but I still miss 80%.

Any suggestions as to how efficiently quadruple the cache and still make it more
efficient to look up and discard values than the current cache of just one entry
and binary lookup would be welcome, but please don't spend/waste a lot of time
on it.

Robert
--
Robert AH Prins
robert(a)prino(d)org
The hitchhiking grandfather - https://prino.neocities.org/
Some REXX code for use on z/OS - https://prino.neocities.org/zOS/zOS-Tools.html
Terje Mathisen
2023-02-27 20:51:37 UTC
Permalink
Since you have a way to measure hit rates, you obviously have some real
(or synthetic?) access order test data, right?

In that case you should give us a useful sample of that (maybe 10-25%)
and let us try various alternatives, then we can measure average access
time, along with max, min and various percentiles.

I have a very hard time seeing how this could ever take more than maybe
20-30 clock cycles, using a combination of an initial table lookup to
hit the first entry starting with a given letter and then a scan for the
actual country:

The worst case is R with 24 entries, so naively

movzx edi,al
mov edx,firstletter[edi*4 - 'A'*4]
mov ecx,25
repne scasd

Assuming REPNE is too slow we can convert to simpler code:

movzx edi,al
mov edx,firstletter[edi*4 - 'A'*4]
next:
cmp eax,[edi]
lea edi,[edi+4]
jb next
je found
jmp not_found

This will normally run in a single cycle/iteration, but with two
cycles/iteration plus a branch miss at the end, it could take 50-60
cycles to determine that RZZ does not exist.

Looking at the country codes I'm tempted to use all 5 bits from the
first char and augment that with the 3 higher bits from the second, for
a 256-entry lookup table where the correct entry will be found in a
maximum of 6 tries (RA, RB, RC, RCA, RCB, RCH). All other country codes
seems to hit in less than this, so with a 16-bit index into the country
table I get something like this:

; EAX has country code in space-filled field
mov ebx,eax
mov ecx,eax
and ebx,31 ;; 5 bits
shr ecx,10 ;; Bottom char
and ecx,7 ;; 3 bits
lea ebx,[ecx+eax*8] ;; Combine them
movzx ebx,word ptr country_index[ebx*2] ;; 5-8 cycle prefix
next:
cmp eax,country_table[ebx] ;; 4-char country code
lea ebx,[ebx+COUNTRY_RECORD_SIZE]
jb next ;; 1-2 cycle iteration
je found ;; 4-10 cycle branch miss
jmp invalid_country_code

This looks like a guaranteed hit in less than 25 cycles, including a
single branch miss, as long as the country table and country_index[]
(512 bytes) are in $L1.

Terje
Post by Robert Prins
Post by Robert Prins
Post by Terje Mathisen
Post by Robert Prins
Post by Terje Mathisen
Robert, are you intentionally trying to annoy me? :-)
My own N, Norway is missing!
Please look again! My program would crash, in 1982 I hitched all the
way up to Nordkapp, that wonderful place where people only go to to
wrongly tell their friends that they've been at the northernmost
point of Europe. ;)
Yeah, by my guess at least 99.99+% of all visitors to Nordkapp think
If you want "continental Europe", then the old ferry, now undersea
tunnel to the Magerøya island would disqualify Nordkapp. If you otoh
allow this, like I would do as well since there is a small city on
the island, then your map should show you the 3-5 hour hike needed to
https://mapant.no/?#7/71.184/25.674  Knivskjærodden
I know, although didn't in 1982, but that year hiking would have been
all but impossible, as the place was still covered in a thick layer of
snow, even though I got there on 20 June and left on 21 June, without
having seen the midsummer sun, as the place was also covered in dense
fog. :(
Post by Terje Mathisen
You could also make a very good case for
https://mapant.no/?#8/71.130/27.648 Kinnarodden which is on the
actual mainland, just a bit further east and slightly longer south.
The difference of 0.054 degrees corresponds to about 6 km.
And after this touristic break, let's return to the lookup ;)
Here's what comes out of the current PL/I version, where I actually
cache the last country/nationality, and still get a 90% cache-miss, ouch.
<snip PL/I>
Post by Robert Prins
Anyway, I'll double(!) the cache, and see what difference that makes. ;)
Not surprisingly, that doubles the hits, but I still miss 80%.
Any suggestions as to how efficiently quadruple the cache and still make
it more efficient to look up and discard values than the current cache
of just one entry and binary lookup would be welcome, but please don't
spend/waste a lot of time on it.
Robert
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
Robert Prins
2023-02-28 13:15:58 UTC
Permalink
Since you have a way to measure hit rates, you obviously have some real (or
synthetic?) access order test data, right?
I run the PL/I version of the program, with the "COUNT" compiler option on a(n
emulated) z/OS system, and that gives me the number of times each statement is
executed, and with two cached values in two separate "return(cache(1|2));"
statements, I know that my cache hits double when I keep the two last most
recently used values, evicting the one with the lowest hit counter, probably not
very sophisticated, but it works.
In that case you should give us a useful sample of that (maybe 10-25%) and let
us try various alternatives, then we can measure average access time, along with
max, min and various percentiles.
I can give you the full set of data, I'll just add a writeln() to the Pascal
version (I don't want to re-IPL z/OS at the moment), zip it up and put it in the
temp directory of my website, and have done just now. The redirected output of
the Pascal version "lift" can be found @
<https://prino.neocities.org/@temp/lift-cache.zip>, with the table of countries
in <https://prino.neocities.org/@temp/hhcommon.zip>

5,408 not cached values, 4,387 cached values, need to figure out why the PL/I
figures are so different.
I have a very hard time seeing how this could ever take more than maybe 20-30
clock cycles, using a combination of an initial table lookup to hit the first
The worst case is R with 24 entries, so naively
  movzx edi,al
  mov edx,firstletter[edi*4 - 'A'*4]
  mov ecx,25
  repne scasd
  movzx edi,al
  mov edx,firstletter[edi*4 - 'A'*4]
  cmp eax,[edi]
  lea edi,[edi+4]
  jb next
  je found
  jmp not_found
This will normally run in a single cycle/iteration, but with two
cycles/iteration plus a branch miss at the end, it could take 50-60 cycles to
determine that RZZ does not exist.
From a performance point of view, I could/should of course only keep the
current 95 countries/nationalities, and let the program warn me with an "UD2"
that I need to add a new one.
Looking at the country codes I'm tempted to use all 5 bits from the first char
and augment that with the 3 higher bits from the second, for a 256-entry lookup
table where the correct entry will be found in a maximum of 6 tries (RA, RB, RC,
RCA, RCB, RCH). All other country codes seems to hit in less than this, so with
; EAX has country code in space-filled field
  mov ebx,eax
  mov ecx,eax
  and ebx,31    ;; 5 bits
  shr ecx,10    ;; Bottom char
  and ecx,7    ;; 3 bits
  lea ebx,[ecx+eax*8]    ;; Combine them
  movzx ebx,word ptr country_index[ebx*2]    ;; 5-8 cycle prefix
  cmp eax,country_table[ebx]    ;; 4-char country code
  lea ebx,[ebx+COUNTRY_RECORD_SIZE]
   jb next                ;; 1-2 cycle iteration
   je found                ;; 4-10 cycle branch miss
   jmp invalid_country_code
This looks like a guaranteed hit in less than 25 cycles, including a single
branch miss, as long as the country table and country_index[] (512 bytes) are in
$L1.
I have no clue as how to figure out how to determine that, I know the CPU (an
i7-4710MQ) has 6Gb cache, and I also now that the z/OS PL/I version uses about
20Mb of heap to keep the input data in various linked lists, and that the x86
versions is likely to use more, as I've rounded up many of the list structures
to 32n bytes, to allow them to be initialised by VMOVDQU instruction.

I'll give your code a try, thanks,

Robert
--
Robert AH Prins
robert(a)prino(d)org
The hitchhiking grandfather - https://prino.neocities.org/
Some REXX code for use on z/OS - https://prino.neocities.org/zOS/zOS-Tools.html
Terje
Post by Robert Prins
Post by Robert Prins
Post by Robert Prins
Post by Terje Mathisen
Robert, are you intentionally trying to annoy me? :-)
My own N, Norway is missing!
Please look again! My program would crash, in 1982 I hitched all the way up
to Nordkapp, that wonderful place where people only go to to wrongly tell
their friends that they've been at the northernmost point of Europe. ;)
Yeah, by my guess at least 99.99+% of all visitors to Nordkapp think they
If you want "continental Europe", then the old ferry, now undersea tunnel to
the Magerøya island would disqualify Nordkapp. If you otoh allow this, like
I would do as well since there is a small city on the island, then your map
should show you the 3-5 hour hike needed to get from the main road to the
https://mapant.no/?#7/71.184/25.674  Knivskjærodden
I know, although didn't in 1982, but that year hiking would have been all but
impossible, as the place was still covered in a thick layer of snow, even
though I got there on 20 June and left on 21 June, without having seen the
midsummer sun, as the place was also covered in dense fog. :(
You could also make a very good case for
https://mapant.no/?#8/71.130/27.648 Kinnarodden which is on the actual
mainland, just a bit further east and slightly longer south.
The difference of 0.054 degrees corresponds to about 6 km.
And after this touristic break, let's return to the lookup ;)
Here's what comes out of the current PL/I version, where I actually cache the
last country/nationality, and still get a 90% cache-miss, ouch.
<snip PL/I>
Post by Robert Prins
Anyway, I'll double(!) the cache, and see what difference that makes. ;)
Not surprisingly, that doubles the hits, but I still miss 80%.
Any suggestions as to how efficiently quadruple the cache and still make it
more efficient to look up and discard values than the current cache of just
one entry and binary lookup would be welcome, but please don't spend/waste a
lot of time on it.
Robert
Robert Prins
2023-03-01 20:16:33 UTC
Permalink
Since you have a way to measure hit rates, you obviously have some real (or
synthetic?) access order test data, right?
In that case you should give us a useful sample of that (maybe 10-25%) and let
us try various alternatives, then we can measure average access time, along with
max, min and various percentiles.
I have a very hard time seeing how this could ever take more than maybe 20-30
clock cycles, using a combination of an initial table lookup to hit the first
The worst case is R with 24 entries, so naively
  movzx edi,al
  mov edx,firstletter[edi*4 - 'A'*4]
  mov ecx,25
  repne scasd
This will work, as it's a straight compare for equality
  movzx edi,al
  mov edx,firstletter[edi*4 - 'A'*4]
  cmp eax,[edi]
  lea edi,[edi+4]
  jb next
  je found
  jmp not_found
I may be wrong, but should you not have included BSWAP instructions, like in my
original binary lookup? (I cannot use MOVBE)
This will normally run in a single cycle/iteration, but with two
cycles/iteration plus a branch miss at the end, it could take 50-60 cycles to
determine that RZZ does not exist.
Looking at the country codes I'm tempted to use all 5 bits from the first char
and augment that with the 3 higher bits from the second, for a 256-entry lookup
table where the correct entry will be found in a maximum of 6 tries (RA, RB, RC,
RCA, RCB, RCH). All other country codes seems to hit in less than this, so with
; EAX has country code in space-filled field
  mov ebx,eax
  mov ecx,eax
  and ebx,31    ;; 5 bits
  shr ecx,10    ;; Bottom char
  and ecx,7    ;; 3 bits
  lea ebx,[ecx+eax*8]    ;; Combine them
lea ebx,[ecx+ebx*8] ; I presume?
  movzx ebx,word ptr country_index[ebx*2]    ;; 5-8 cycle prefix
  cmp eax,country_table[ebx]    ;; 4-char country code
  lea ebx,[ebx+COUNTRY_RECORD_SIZE]
   jb next                ;; 1-2 cycle iteration
   je found                ;; 4-10 cycle branch miss
   jmp invalid_country_code
This looks like a guaranteed hit in less than 25 cycles, including a single
branch miss, as long as the country table and country_index[] (512 bytes) are in
$L1.
OK, now I just have to figure out what's in those two tables.

Robert
--
Robert AH Prins
robert(a)prino(d)org
The hitchhiking grandfather - https://prino.neocities.org/
Some REXX code for use on z/OS - https://prino.neocities.org/zOS/zOS-Tools.html
Terje Mathisen
2023-03-02 08:06:48 UTC
Permalink
To make this work with 32-bit compares we need the most significant
letter (i.e. the first) to be located in the last byte, i.e. the
4-letter country codes must be stored in BE order in the table! We could
also use a secondary table consisting of just the reversed country codes
and a pointer to the actual record: With ~400 countries this would be
3.2KB, then we add 512 bytes for the ~1.6 letter lookup table which
contains 16-bit offsets into the country table. We end up with less than
4 KB total plus the 32/64 bytes per country for the actual table.

It is much better to require a BSWAP when using it later for other stuff
(or simply storing both forms as in that extra helper table) than to
require a BSWAP in the critical path.

BTW, if you actually implement this idea, then I would also compress the
country table by stripping out all the extra spaces: The inverted
country lookup table would contain the actual address of each packed
record. Doing this means that you might halve the size of the table and
avoid the potentially bad issue of having all records aligned on a
larger pwoer of two, since this can easily lead to cache trashing via
limited associativity. (Normally not an issue for anything that fits in
$L1!)

Terje
Post by Robert Prins
Post by Terje Mathisen
Since you have a way to measure hit rates, you obviously have some
real (or synthetic?) access order test data, right?
In that case you should give us a useful sample of that (maybe 10-25%)
and let us try various alternatives, then we can measure average
access time, along with max, min and various percentiles.
I have a very hard time seeing how this could ever take more than
maybe 20-30 clock cycles, using a combination of an initial table
lookup to hit the first entry starting with a given letter and then a
The worst case is R with 24 entries, so naively
   movzx edi,al
   mov edx,firstletter[edi*4 - 'A'*4]
   mov ecx,25
   repne scasd
This will work, as it's a straight compare for equality
Post by Terje Mathisen
   movzx edi,al
   mov edx,firstletter[edi*4 - 'A'*4]
   cmp eax,[edi]
   lea edi,[edi+4]
   jb next
   je found
   jmp not_found
I may be wrong, but should you not have included BSWAP instructions,
like in my original binary lookup? (I cannot use MOVBE)
Post by Terje Mathisen
This will normally run in a single cycle/iteration, but with two
cycles/iteration plus a branch miss at the end, it could take 50-60
cycles to determine that RZZ does not exist.
Looking at the country codes I'm tempted to use all 5 bits from the
first char and augment that with the 3 higher bits from the second,
for a 256-entry lookup table where the correct entry will be found in
a maximum of 6 tries (RA, RB, RC, RCA, RCB, RCH). All other country
codes seems to hit in less than this, so with a 16-bit index into the
; EAX has country code in space-filled field
   mov ebx,eax
   mov ecx,eax
   and ebx,31    ;; 5 bits
   shr ecx,10    ;; Bottom char
   and ecx,7    ;; 3 bits
   lea ebx,[ecx+eax*8]    ;; Combine them
lea ebx,[ecx+ebx*8] ; I presume?
Post by Terje Mathisen
   movzx ebx,word ptr country_index[ebx*2]    ;; 5-8 cycle prefix
   cmp eax,country_table[ebx]    ;; 4-char country code
   lea ebx,[ebx+COUNTRY_RECORD_SIZE]
    jb next                ;; 1-2 cycle iteration
    je found                ;; 4-10 cycle branch miss
    jmp invalid_country_code
This looks like a guaranteed hit in less than 25 cycles, including a
single branch miss, as long as the country table and country_index[]
(512 bytes) are in $L1.
OK, now I just have to figure out what's in those two tables.
Robert
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
Robert Prins
2023-03-08 23:49:34 UTC
Permalink
OK, I've gone for what I think is a pretty good solution:

My first try?

I added an array['A'..'Z'] of array [0..1], which contained [low, high] for each
entry, reducing the entries in the "do-while low <= high" loop from a current
40,364 to 14,909, or about 63%.

But then I thought, "Mm, some countries starting with letter "?" seem to occur a
lot more than other countries starting with letter "?", but will only be found
after a few binary chops... So why not expand the ['A'..'Z'] array to an array
of [0..3] arrays, containing [low, high, most-used, 0] and on a no-match first
try the "most-used" index?

Result? Only 2,823 entries into the "do-while low <= high" loop, a reduction of
just over 93%.

Do I make sense, and does this come anywhere close to your (Terje) "almost all
programming can be viewed as an exercise in caching" mantra?

Robert
--
Robert AH Prins
robert(a)prino(d)org
The hitchhiking grandfather - https://prino.neocities.org/
Some REXX code for use on z/OS - https://prino.neocities.org/zOS/zOS-Tools.html
To make this work with 32-bit compares we need the most significant letter (i.e.
the first) to be located in the last byte, i.e. the 4-letter country codes must
be stored in BE order in the table! We could also use a secondary table
consisting of just the reversed country codes and a pointer to the actual
record: With ~400 countries this would be 3.2KB, then we add 512 bytes for the
~1.6 letter lookup table which contains 16-bit offsets into the country table.
We end up with less than 4 KB total plus the 32/64 bytes per country for the
actual table.
It is much better to require a BSWAP when using it later for other stuff (or
simply storing both forms as in that extra helper table) than to require a BSWAP
in the critical path.
BTW, if you actually implement this idea, then I would also compress the country
table by stripping out all the extra spaces: The inverted country lookup table
would contain the actual address of each packed record. Doing this means that
you might halve the size of the table and avoid the potentially bad issue of
having all records aligned on a larger pwoer of two, since this can easily lead
to cache trashing via limited associativity. (Normally not an issue for anything
that fits in $L1!)
Terje
Post by Robert Prins
Since you have a way to measure hit rates, you obviously have some real (or
synthetic?) access order test data, right?
In that case you should give us a useful sample of that (maybe 10-25%) and
let us try various alternatives, then we can measure average access time,
along with max, min and various percentiles.
I have a very hard time seeing how this could ever take more than maybe 20-30
clock cycles, using a combination of an initial table lookup to hit the first
The worst case is R with 24 entries, so naively
   movzx edi,al
   mov edx,firstletter[edi*4 - 'A'*4]
   mov ecx,25
   repne scasd
This will work, as it's a straight compare for equality
   movzx edi,al
   mov edx,firstletter[edi*4 - 'A'*4]
   cmp eax,[edi]
   lea edi,[edi+4]
   jb next
   je found
   jmp not_found
I may be wrong, but should you not have included BSWAP instructions, like in
my original binary lookup? (I cannot use MOVBE)
This will normally run in a single cycle/iteration, but with two
cycles/iteration plus a branch miss at the end, it could take 50-60 cycles to
determine that RZZ does not exist.
Looking at the country codes I'm tempted to use all 5 bits from the first
char and augment that with the 3 higher bits from the second, for a 256-entry
lookup table where the correct entry will be found in a maximum of 6 tries
(RA, RB, RC, RCA, RCB, RCH). All other country codes seems to hit in less
than this, so with a 16-bit index into the country table I get something like
; EAX has country code in space-filled field
   mov ebx,eax
   mov ecx,eax
   and ebx,31    ;; 5 bits
   shr ecx,10    ;; Bottom char
   and ecx,7    ;; 3 bits
   lea ebx,[ecx+eax*8]    ;; Combine them
lea ebx,[ecx+ebx*8] ; I presume?
   movzx ebx,word ptr country_index[ebx*2]    ;; 5-8 cycle prefix
   cmp eax,country_table[ebx]    ;; 4-char country code
   lea ebx,[ebx+COUNTRY_RECORD_SIZE]
    jb next                ;; 1-2 cycle iteration
    je found                ;; 4-10 cycle branch miss
    jmp invalid_country_code
This looks like a guaranteed hit in less than 25 cycles, including a single
branch miss, as long as the country table and country_index[] (512 bytes) are
in $L1.
OK, now I just have to figure out what's in those two tables.
Robert
Terje Mathisen
2023-03-09 10:19:33 UTC
Permalink
This is much better!

I still think you should pay more attention to the cost of branches,
i.e. a sequential scan of a shortish segment of the country array is
very cheap, on the order of 2-3 clock cycles/country, while a single
mispredicted branch cost 8-20 cycles depending upon the CPU you are using.

Assuming this really is so important, why not "waste" a bit more lookup
table RAM for a full 2-char index?

This is still only 26x27 (including space)->722 entries, each needing 16
bits so 1444 bytes.

Looking at your country list I see the following long IDs that need a
short scan after the lookup:

AND,ANG
ARK,ARU
AUA,AUS
AX,AXA

BD,BDS
BH,BHT
BR,BRN,BRU

CH,CHN
CO,COM
CY,CYM

EAK,EAT,EAU,EAZ
ES,EST
ET,ETH

GB,GBA,GBG,GBJ,GBM,GBZ

HK,HKJ

IT,IRL,IRQ

LAO,LAR

MA,MAL
MO,MOC

NA,NAM,NAU
NI,NIC

PA,PAL

RC,RCA,RCB,RCH
RG,RGB
RI,RIM
RM,RMM
RO,ROK,ROU
RU,RUS

SLE,SLO
SME,SMO
SO,SOL
SY,SYR

UA,UAE

WAG,WAL,WAN
WS,WSA

All the remaining countries will point to a single entry, and for the
worst case here, starting with GB which has 6 entries, the first one
(GB) is by far the most common one, right? There are only 48 countries
which are not unique or the first entry for a given combination!

mov ebx,[country] ;; LE byte order, so first byte in AL
mov eax,ebx
and ebx,1f1fh
xor ecx,ecx
imul cl,bl,27
shr ebx,8 ;; key = (name[1] & 31) + ((name[0] & 31) * 27)
add ebx,ecx
bswap eax ;; Big endian for faster compares

movzx ebx,word ptr twobyte_table[ebx*2]
next:
cmp eax,country_table[ebx] ;; Each entry starts with BE name
je found ;; This will happen immediately for
;; all except 48 countries
lea ebx,[ebx+COUNTRY_TABLE_ENTRY_SIZE]
jb next
not_found: ;; Invalid/missing country code

BTW, from the size of the worst case entry, GB*, it is obvious that
26*27*6=4212 entries in a hash/lookup table will be enough to find all
entries directly, with zero searching.

I really cannot see any circumstance that would require such
speed/optimization for this trivial problem!

Terje
Post by Robert Prins
My first try?
I added an array['A'..'Z'] of array [0..1], which contained [low, high]
for each entry, reducing the entries in the "do-while low <= high" loop
from a current 40,364 to 14,909, or about 63%.
But then I thought, "Mm, some countries starting with letter "?" seem to
occur a lot more than other countries starting with letter "?", but will
only be found after a few binary chops... So why not expand the
['A'..'Z'] array to an array of [0..3] arrays, containing [low, high,
most-used, 0] and on a no-match first try the "most-used" index?
Result? Only 2,823 entries into the "do-while low <= high" loop, a
reduction of just over 93%.
Do I make sense, and does this come anywhere close to your (Terje)
"almost all programming can be viewed as an exercise in caching" mantra?
Robert
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
Robert Prins
2023-03-09 15:28:52 UTC
Permalink
Post by Terje Mathisen
This is much better!
Thank you!
Post by Terje Mathisen
I still think you should pay more attention to the cost of branches, i.e. a
sequential scan of a shortish segment of the country array is very cheap, on the
order of 2-3 clock cycles/country, while a single mispredicted branch cost 8-20
cycles depending upon the CPU you are using.
An AMD FX8150 and an Intel i7-4710MQ.
Post by Terje Mathisen
Assuming this really is so important, why not "waste" a bit more lookup table
RAM for a full 2-char index?
It's Utterly Unimportant (with two huge "U"s), the program as-is runs in 0.32
seconds clock-time, but it's just interesting to see how far I can go in pure
Pascal, to translate that subsequently into in-line assembler, where I'm in many
cases harking back to Turbo Pascal 3 times, by coding post-Pentium instructions
as "db" sequences, the original code dates back to 1994, and pretty amazingly,
many of my original record structures are very AVX friendly!

Anyway, given that my ['A'..'Z'] array had one position open, I've used that to
replace the general last-matched country with a per-initial character
last-matched country, and that had cut the chop-count even more:

Original full range : 40,362
Low/high per initial character (pic): 14,907
As previous + first most used pic : 2,871
As previous + cache pic : 1,747

And now I'm going to think slightly outside the box, but will have to knock up a
bit of REXX to actually test this, by what might be counter-intuitive, upping
the low and/or high value of some ranges into the previous/next (or, unlikely,
but never say never, pre-previous or over-next) range, so that an initial first
lookup immediately finds the second most-used country, without generating an
excessive number of additional lookups for even less used countries.

E.g. for countries starting with "L" I currently only have three, in order of
use, "LT", "LV" (74 lookups), and "L" (45 lookups), both requiring three chops
for a total of 357 chops. If I were to extend the "L" range into the "M" range
so that a first chop returns "LV" I save 148 chops, which means that even if I
need up to three more chops for "L", I still save time, and for this specific
case, extending the "Lx" range to "MEX" gives me single-chop hits on "LV" and
still the same three-chop hits on "L", so reducing the number of chops to 209.
In this specific case, I could go even further, and take the lo-range back into
the "K" countries and get to "L" in just two chops.

Doing the same for the also easy to try three additional (after "GB") "G*"
countries saves me another 96 chops if I put "GR", the second most used, in the
initial middle of the range (by extending the high value to "IR"), without
affecting "GE" and "GH"chop counts.

I'm going to see how this works out before doing anything else, as this change
would be easy to implement in Pascal and in my PL/I version of the same on z/OS,
I keep the two and they should give the same output, if they don't I know that
I've screwed up somewhere!
Post by Terje Mathisen
This is still only 26x27 (including space)->722 entries, each needing 16 bits so
1444 bytes.
Looking at your country list I see the following long IDs that need a short scan
AND,ANG
ARK,ARU
AUA,AUS
AX,AXA
BD,BDS
BH,BHT
BR,BRN,BRU
CH,CHN
CO,COM
CY,CYM
EAK,EAT,EAU,EAZ
ES,EST
ET,ETH
GB,GBA,GBG,GBJ,GBM,GBZ
HK,HKJ
IT,IRL,IRQ
LAO,LAR
MA,MAL
MO,MOC
NA,NAM,NAU
NI,NIC
PA,PAL
RC,RCA,RCB,RCH
RG,RGB
RI,RIM
RM,RMM
RO,ROK,ROU
RU,RUS
SLE,SLO
SME,SMO
SO,SOL
SY,SYR
UA,UAE
WAG,WAL,WAN
WS,WSA
…and most of them would not even be required. The most optimal solution would be
to write a pre-processor that actually builds the plates and index-to-the-plates
tables from the input file, potentially even with the range extensions, for only
the countries/nationalities that are used. And of course right now I already
could throw out duplicates (GB/UK, MK/NMK, ROU/UY, SF/FIN) and various regions
like ENG/SCO/NI/CYM (not the various GBx'es), and stick to the 193 UN countries,
and the 6 non-official ones.

<snip code, look at it RSN…>

As I already wrote, there are abso-(fill in your favourite strong word)-lutely
no circumstances whatsoever to do this, other than and it's still too early/cold
to start work in the garden and greenhouse for the summer. ;) However, it keeps
my mind sharp(ish), just in case that elusive z/OS PL/I job comes along sometime
in the next two or three years, before I retire.

Anyway thanks for your feedback, I'll keep you posted with the results of my
further experiments.

Robert
--
Robert AH Prins
robert(a)prino(d)org
The hitchhiking grandfather - https://prino.neocities.org/
Some REXX code for use on z/OS - https://prino.neocities.org/zOS/zOS-Tools.html
Post by Terje Mathisen
Terje
Post by Robert Prins
My first try?
I added an array['A'..'Z'] of array [0..1], which contained [low, high] for
each entry, reducing the entries in the "do-while low <= high" loop from a
current 40,364 to 14,909, or about 63%.
But then I thought, "Mm, some countries starting with letter "?" seem to occur
a lot more than other countries starting with letter "?", but will only be
found after a few binary chops... So why not expand the ['A'..'Z'] array to an
array of [0..3] arrays, containing [low, high, most-used, 0] and on a no-match
first try the "most-used" index?
Result? Only 2,823 entries into the "do-while low <= high" loop, a reduction
of just over 93%.
Do I make sense, and does this come anywhere close to your (Terje) "almost all
programming can be viewed as an exercise in caching" mantra?
Loading...