Graphs & Hash Tables (re: programming)

TranceAddict Forums Archive > Main Forums > Chill Out Room

Pages: [1] 2

Graphs & Hash Tables (re: programming)

View this Thread in Original format

UglyDave

ok folks, here's the deal..

i have an exam in the morning, and i was sick & missed two lectures.

$5 to the first person who can give me a brief explanation of what both of these things are.

thank you very much.

David :)

DigiNut

quote:
Originally posted by UglyDave
ok folks, here's the deal..

i have an exam in the morning, and i was sick & missed two lectures.

$5 to the first person who can give me a brief explanation of what both of these things are.

thank you very much.

David :)

A hash table is a system of storing objects based on some numeric function of that object. A hash can be any function, but as an example, let's say you're storing strings in a 100-element hash table, one way of hashing would be to take the ASCII codes of all the characters, add them together (sum), and store the string in array element (sum % 100).

The purpose is to provide a faster method of storing and searching, i.e. rather than having to sort alphabetically or scan through the entire array to find something.

Of course there also needs to be a system of dealing with hash conflicts, since no hashing function is completely unique. But that's a whole other topic.

As for graphs... I think we need some context there.

Noisician

quote:
Originally posted by DigiNut
A hash table is a system of storing objects based on some numeric function of that object. A hash can be any function, but as an example, let's say you're storing strings in a 100-element hash table, one way of hashing would be to take the ASCII codes of all the characters, add them together (sum), and store the string in array element (sum % 100).

The purpose is to provide a faster method of storing and searching, i.e. rather than having to sort alphabetically or scan through the entire array to find something.

Of course there also needs to be a system of dealing with hash conflicts, since no hashing function is completely unique. But that's a whole other topic.

As for graphs... I think we need some context there.

yes, basically.

hash tables are among the most convenient data-storage structures when it comes to insertion and searching (it's practically o[1] in the big-o notation). coders usually use them when there's a need to look up tens of thousands of items in less than a second. in fact, hash tabels *are* the fastest data structure. they are especially useful when large amounts of data are involved. they are what is typically used in spell checkers and as symbol tables in compilers. another advantage of the tables is their insensitivity toward the order in which data is stored/inserted (unlike trees). as a result, programming for hash tables is much and much easier than for (binary) trees. there are three kinds of their conceptual implementations: linear probing, quadric probing and separate chaining. all of them have their pros and cons. though hash tables with separate chaining are the hardest to program.

of course, as with all data-storage structures, hash tables have their own disadvantages. for one thing, all of them are based on arrays, and arrays are somewhat hard to expend once they've been created. in addition, with hash tables, there's no way to visit the stored items in any kind of order - that is, no ordered traversal is basically possible. that's where binary search trees would be a better choice. also, the efficiency of some particular kinds of hash tables is inversely proportional to their "fullness." their performance might decrease drastically if they become too full.

as for the graphs, i *think* in this case it referes to the big-o notation, which is merely a measure of the efficiency of a particular algorithm. one important thing to remember is that the purpose of a big-o graph (which shows the relationship between time and number of items) is NOT to depict an actual figure for running time, but rather to demonstrate the way running times are affected by the number of items. in other words, this is a kind of comparison that is directly related to the number of items. different algorithms have different running time in big-o notation and therefore have different graphs. for example,

the running time of linear search is represented through the o(n) notation where n is the number of items in the array.
the running time of insertion in an *unoredered* array is always o(1) = 1
the running time of insertion in an ordered array is o(n)
the running time of deletion in any array is o(n) as well
binary search is proportional to log(n)
etc. etc.

another important thing to know is the fact that the majority of these algorithms are represented through monotonously-increasing functions, which is a very bad thing because it in fact shows that the efficiency drops when the number of items increases. i can't think of any algorithm whose graph is a monotonously-decreasing function, which would be just perfect. because of this limitation, an algorithm that runs in o(1) time is the best.

this is why hash tables are almost ideal when working with huge amounts of data. no mattar how many data items there are, insertion and searching take close to constant time, the perfect o(1)!

DigiNut

quote:
Originally posted by Noisician
yes, basically.
...in fact, hash tabels *are* the fastest data structure. they are especially useful when large amounts of data are involved.

Not true, there are also scatter table implementations which eliminate the clustering/coalescence problems (re-hashing algorithms). They have all the advantages of hash tables but are more effective at dealing with collisions. I wasn't going to get into that but since you brought up the subject... :p

Even better is a self-chained scatter table, but that's pretty hard to implement.

Hash tables are the most convenient because they're readily available in the default C++/Java libraries.

Noisician

quote:
Originally posted by DigiNut
Not true, there are also scatter table implementations which eliminate the clustering/coalescence problems (re-hashing algorithms). They have all the advantages of hash tables but are more effective at dealing with collisions. I wasn't going to get into that but since you brought up the subject... :p

Even better is a self-chained scatter table, but that's pretty hard to implement.

Hash tables are the most convenient because they're readily available in the default C++/Java libraries.

quote:
Originally posted by Noisician
also, the efficiency of some particular kinds of hash tables is inversely proportional to their "fullness." their performance might decrease drastically if they become too full.

both chained scatter and open scatter algorithms may cause removal to be very complicated due to their limited capacity. as such tables become more saturated with data, their chains tend to accrete, creating new, anfractuous data-chains, which SIGNIFICANTLY slows down the search and removal operations.

DigiNut

quote:
Originally posted by Noisician
both chained scatter and open scatter algorithms may cause removal to be very complicated due to their limited capacity. as they become more saturated with data, their chains tend to accrete, creating new, anfractuous data-chains, which SIGNIFICANTLY slows down the search and removal operations.

That would be coalescence, which is why instead of that, you use re-hashing or inner chaining (not separate chaining).

:p

Noisician

quote:
Originally posted by DigiNut
That would be coalescence, which is why instead of that, you use re-hashing or inner chaining (not separate chaining).

:p

that solution is not ideal, either. even though using a second hash function drastically reduces *primary* clustering, the same problem exists nevertheless: the performance deteriorates as the table fills up.

and before u've mentioned this, of course u can use more than two functions, but such schemes would be very and very hard to implement.

TrAnCe CoNtRoL

quote:
Originally posted by Noisician
yes, basically.

hash tables are among the most convenient data-storage structures when it comes to insertion and searching (it's practically o[1] in the big-o notation). coders usually use them when there's a need to look up tens of thousands of items in less than a second. in fact, hash tabels *are* the fastest data structure. they are especially useful when large amounts of data are involved. they are what is typically used in spell checkers and as symbol tables in compilers. another advantage of the tables is their insensitivity toward the order in which data is stored/inserted (unlike trees). as a result, programming for hash tables is much and much easier than for (binary) trees. there are three kinds of their conceptual implementations: linear probing, quadric probing and separate chaining. all of them have their pros and cons. though hash tables with separate chaining are the hardest to program.

of course, as with all data-storage structures, hash tables have their own disadvantages. for one thing, all of them are based on arrays, and arrays are somewhat hard to expend once they've been created. in addition, with hash tables, there's no way to visit the stored items in any kind of order - that is, no ordered traversal is basically possible. that's where binary search trees would be a better choice. also, the efficiency of some particular kinds of hash tables is inversely proportional to their "fullness." their performance might decrease drastically if they become too full.

as for the graphs, i *think* in this case it referes to the big-o notation, which is merely a measure of the efficiency of a particular algorithm. one important thing to remember is that the purpose of a big-o graph (which shows the relationship between time and number of items) is NOT to depict an actual figure for running time, but rather to demonstrate the way running times are affected by the number of items. in other words, this is a kind of comparison that is directly related to the number of items. different algorithms have different running time in big-o notation and therefore have different graphs. for example,

the running time of linear search is represented through the o(n) notation where n is the number of items in the array.
the running time of insertion in an *unoredered* array is always o(1) = 1
the running time of insertion in an ordered array is o(n)
the running time of deletion in any array is o(n) as well
binary search is proportional to log(n)
etc. etc.

another important thing to know is the fact that the majority of these algorithms are represented through monotonously-increasing functions, which is a very bad thing because it in fact shows that the efficiency drops when the number of items increases. i can't think of any algorithm whose graph is a monotonously-decreasing function, which would be just perfect. because of this limitation, an algorithm that runs in o(1) time is the best.

this is why hash tables are almost ideal when working with huge amounts of data. no mattar how many data items there are, insertion and searching take close to constant time, the perfect o(1)!

hmm yes i agree

Photo_bot_2k1

nerds.
;)

Perfect_Cheezit

tranceaddict > hw resource help

i wonder if, when i get to college, ill be asking for help in this forum :wtf:

Galapidate

quote:
Originally posted by Photo_bot_2k1
nerds.
;)

:stongue: :stongue: :stongue:

ASOT100

quote:
Originally posted by Photo_bot_2k1
nerds.
;)

someone's jealous :tongue2

CLICK TO RETURN TO TOP OF PAGE

Pages: [1] 2

Privacy Statement