Bad Hashing Function
HomeBlogAbout UsWorkContentContact Us
 
 Advertisement 

Bad Hash Function

There’s not a lot of science involved in today’s short post. It’s more an excuse to write some SQL queries, plot some graphs, and create a few random pieces of trivia.

Substitution Cypher

There’s a popular substitution cypher used by kids which replace letters with their numerical position in the English alphabet.

A=1 B=2 C=3 … Z=26

So, for instance, using the cypher, the word KISS changes to 11-9-19-19.

For today’s post we’re going to take a dictionary of English words, replace each letter with its number, then sum these values together to produce a single number for each word.

For example, using the word CAUSE we get a score of 3+1+21+19+5 = 49.

Bad Hash

This score is a (very poor) HASH function for our words. Incredibly poor. I hope you can see why (you could probably write an entire book why this is a poor choice for a hash function). Here, in no particular order, are a selection of some of the things that this function does poorly.

Distribution of Bad Hash values

Here is a breakdown of the distribution of bad hash values using the dictionary:

By length

Pivoting this data by word length, here is a chart showing the range of bad hash values by the length of the word:

ASCII

Binary Oct Dec Hex Char
100 0001 101 65 41 A
100 0010 102 66 42 B
100 0011 103 67 43 C
100 0100 104 68 44 D
100 0101 105 69 45 E
100 0110 106 70 46 F
100 0111 107 71 47 G
100 1000 110 72 48 H
100 1001 111 73 49 I
100 1010 112 74 4A J
100 1011 113 75 4B K
100 1100 114 76 4C L
100 1101 115 77 4D M
100 1110 116 78 4E N
100 1111 117 79 4F O
101 0000 120 80 50 P
101 0001 121 81 51 Q
101 0010 122 82 52 R
101 0011 123 83 53 S
101 0100 124 84 54 T
101 0101 125 85 55 U
101 0110 126 86 56 V
101 0111 127 87 57 W
101 1000 130 88 58 X
101 1001 131 89 59 Y
101 1010 132 90 5A Z

Most Western computers encode letters for storage in something called ASCII (American Standard Code for Information Interchange). This standard, dating back to the 1960's, defines the numerical values used to represent not just letters, but digits, punctuation characters, a smattering of accents and the occasional math and currency symbols. It even has a bell-chime, and distinct carriage return and line-feed control characters for when information was sent to Teletype printing devices.

(It's since been superseded by Unicode, which offers significantly more space for international characters, but the vanilla alphabet remains in the same place).

In ASCII, upper-case letters start at with the value A=65, B=66 …

Why these, seemingly arbitrary, values? Well, it relates to binary and, as you can see from the table on the left, the letters are encoded starting at 64 + the character.

(A lot of thought and inherited history went into the creation of the ASCII table, with numeric digits placed to allow easier conversion to BCD [Binary Coded Decimal] and other punctuation marks like !@# were kept in their corresponding shifted positions as they were on original typewriter keys).

Numerologists

Numerologists (but probably nobody else), will take pleasure from the fact that if you use the ASCII values of letters (A=65, B=66 …) instead of the ordinal value (A=1, B=2 …) that the bad hash for the word ANTIPAPAL is 666 …

Non-numerologists might be more excited to learn that there are 341 other words with the same claim to fame:

ABOLISHES ABSENTEES ACCEPTERS ACCOUTRED ACETYLENE ACHIEVERS ACTINIANS ADVOCATES AESTHETIC AGUEWEEDS AIRFRAMES ALCHEMIST ALGERINES ALLOGRAPH AMBROSIAL AMNESTIED AMPLIFIES ANALITIES ANALOGIZE ANIMALIZE ANTINODAL ANTIPAPAL APPRAISEE ARAGONITE ARCHAIZES ARCHDUKES ARMCHAIRS ARTICHOKE ARTIFICES ASCERTAIN ASPIRATAE AUGMENTED BACKDROPT BANDEROLS BANEBERRY BANKBOOKS BANTERING BARBEQUES BATHROBES BECOWARDS BENTHONIC BESCREENS BESLIMING BESMILING BESWARMED BEWRAPPED BICKERERS BIGNONIAS BILLETING BIOETHICS BIPINNATE BIRDFARMS BLACKBOYS BLANDNESS BLINKARDS BLOTCHING BLOVIATED BLUEBELLS BOOGERMAN BOOKCASES BOOMERANG BRIDEWELL BRISANCES BROOMBALL BUBALISES BUFFETING CABEZONES CACHEPOTS CADASTERS CADASTRES CALISAYAS CALLALOOS CALVARIES CAMOMILES CAMPFIRES CAMSHAFTS CANALIZES CANVASING CAPONATAS CARROCHES CASEBOOKS CATALEXES CATCHPOLL CATENOIDS CATFISHES CATHOLICS CAVALIERS CAVALRIES CAVATINAS CELLMATES CELLULASE CEMENTING CENTIGRAM CHAMBRAYS CHANDLERY CHAPBOOKS CHARLOCKS CHASUBLES CHEAPNESS CHECKLIST CHEEKFULS CHIVAREES CHLAMYDES CHOPPERED CHROMATIC COALHOLES COCKLEBUR COLLIMATE COLORIFIC COMBATIVE COMMENCES COMMENDER CONCHOIDS CONDUCING CONFEREES CONFESSED CONFIDENT CONSIGNED CRANIATES CREMATING CREOLISED CRICETIDS DARNEDEST DAYDREAMS DEADWOODS DEATHBLOW DECEIVERS DECORATES DEFERMENT DEFLATERS DEHYDRATE DEMENTIAS DENSIFIES DIALOGERS DIATHESES DIETARIES DIFFICULT DIGASTRIC DIGITALIS DIGRESSED DISCLOSED DISPLEASE DIVIDENDS DRAGGIEST DRAMATISE DREADFULS DROPHEADS DUMBBELLS DYSPHAGIA ECOLOGIES EDUCATIVE EMANATIVE EMBODIERS EMBOWERED EMBRACERY EMITTANCE ENLIVENED ENTRAINED EPHEMERAS EXCLAIMER EXPLAINED FACEDOWNS FAIRYLAND FANCINESS FANTASIAS FARADISMS FARRAGOES FEOFFMENT FERMENTED FEUDALISM FIGEATERS FILIGREES FILMLANDS FIREBIRDS FIREHALLS FLAVANONE FLORIATED FORBODING FOREHANDS FORJUDGED FRAUGHTED GADABOUTS GALBANUMS GALVANISE GANGRENES GERIATRIC GIDDINESS GINGERING GLANDULAR GLOBALISM GOLCONDAS GRADELESS GRAVESIDE GREATCOAT GRILLAGES HAEMATINS HAGBUSHES HAMARTIAS HAPPENING HARDWIRED HAVOCKING HEADFIRST HECTOGRAM HICCUPING HOMESTEAD HONORABLE HYDATHODE IDEALIZES IMAGISTIC IMPLEDGES IMPRECATE IMPUDENCE INDIGOIDS INFANTILE INFOLDING INORGANIC KAFFIYEHS KEELBOATS KERFUFFLE LABRADORS LAICIZING LANDAULET LANDLINES LARBOARDS LAURELLED LEADSCREW LEGISLATE LIGNIFIES LOGICIZED MANICALLY MARSHLAND MASSAGING MASTHEADS MEANWHILE MEDIATION MEDIEVALS MEMBRANES MIDRANGES MINIBIKER MONADNOCK MONOGAMIC MORTGAGED MUCILAGES MUFFLERED NAUSEATED NEURALGIC NICCOLITE NICKERING NIGHTLIFE OBLIGATES ODDSMAKER OVERCOACH PACIFYING PAGANDOMS PALATABLY PANELLING PANORAMIC PARAFFINS PARANOEAS PATINATED PEBBLIEST PEDOPHILE PESTICIDE PICADORES PILCHARDS PIPELINED PLASMODIA PLAYBACKS PLAYFIELD PRECEDENT PRECENTED PRECREASE PREJUDGED PREVIABLE PROBABLES QUEBRACHO RACKETIER RAINMAKER RANSACKER REACCEPTS REANNEXED RECLOTHED RECOMMEND REDOUNDED REFOLDING REFUTABLE REINFLATE REJECTEES REJOICING RELEASING RELICENSE REMAINING RENIGGING RESEALING RHACHISES RIBBONING RUTABAGAS SAXIFRAGE SCALEPANS SCREWBEAN SCROOCHED SEBACEOUS SECONDING SECTARIAN SERENADES SHARPENED SHASHLICK SHEEPFOLD SHELLACKS SIDEKICKS SIDETRACK SIGNALMAN SKINHEADS SLEIGHING SLUGABEDS SONICATED SPACEWARD SPECIFIER STEAMERED TABLETING TAILPLANE TARIFFING TEAZELLED TENANCIES THATCHING THINCLADS TICTOCKED TOLERABLE TRACKSIDE TRAMELLED TREADLING UNBENDING UNDELUDED UNDRAINED VALENCIES VENENATED VICEREINE VIDEODISC VOCALISED WASHABLES WEEKENDER WEIGELIAS

 

You can find a complete list of all the articles here.      Click here to receive email alerts on new articles.

© 2009-2014 DataGenetics    Privacy Policy