Hashmap - Java - Custom Hash MapTable Some Points

De openkb
Aller à : Navigation, rechercher

Sommaire

Questions

In some previous posts I have asked some questions about coding of Custom Hash Map/Table in java. Now as I can t solve it and may be I forgot to properly mentioning what I really want, I am summarizing all of them to make it clear and precise.

  What I am going to do:   

I am trying to code for our server in which I have to find users access type by URL.

Now, I have 1110 millions of URLs (approx).

So, what we did,

1) Divided the database on 10 parts each of 110 millions of Urls. 2) Building a HashMap using parallel array whose key are URL s one part (represented as LONG) and values are URL s other part (represented as INT) - key can have multiple values .

3) Then search the HashMap for some other URLs (millions of URLs saved in one day) per day at the beginning when system starts.

  What you have Tried:   

1) I have tried many NoSQL databases, however we found not so good for our purpose.

http://stackoverflow.com/questions/11398762/custom-hashmap-code-issue http://stackoverflow.com/questions/11398762/custom-hashmap-code-issue

  So, what the issue is:   

When the system starts we have to load our hashtable of each database and perform search for million of url:

Now, issue is,

http://stackoverflow.com/questions/11312553/java-project-make-hashmap-including-load-store-performance-better http://stackoverflow.com/questions/11312553/java-project-make-hashmap-including-load-store-performance-better

So, we are spending time: (HashTable Load + HashTable Search) * No. of DB = (5 + 20) * 10 = 250 seconds. Which is quite expensive for us and most of the time (200 out of 250 sec) is going for loading hashtables.

  Have you think any-other way:   

One way can be:

Without worrying about loading and storing, and leave caching to the operating system by using a memory-mapped buffer. But, as I have to search for millions of keys, it gives worser performance than above.

As we found HashTable performance is nice but loading time is high, we thought to cut it off in another way like:

http://stackoverflow.com/questions/11749817/java-custom-linked-list-issue http://stackoverflow.com/questions/11749817/java-custom-linked-list-issue

2) Insert values (int s) to the Linked Lists whose number is key number (we reduce the key size to INT).

3) So, we have to store only the linked lists to the disks.

Now, issue is, it is taking lots of time to create such amount of Linked Lists and creating such large amount of Linked Lists has no meaning if data is not well distributed.

  So, What is your requirements:   

Simply my requirements:

1) Key with multiple values insertion and searching. Looking for nice searching performance. 2) Fast way to load (specially) into memory.

(keys are 64 bit INT and Values are 32 bit INT, one key can have at most 2-3 values. We can make our key 32 bit also but will give more collisions, but acceptable for us, if we can make it better).

Can anyone help me, how to solve this or any comment how to solve this issue ?

Thanks.

NB:

1) As per previous suggestions of Stack Overflow, Pre-read data for disk caching is not possible because when system starts our application will start working and on next day when system starts.

2) We have not found NoSQL db s are scaling well as our requirements are simple (means just insert hashtable key value and load and search (retrieve values)).

3) As our application is a part of small project and to be applied on a small campus, I don t think anybody will buy me a SSD disk for that. That is my limitation.

4) We use Guava/ Trove also but they are not able to store such large amount of data in 16 GB also (we are using 32 GB ubuntu server.)

Answers

If you need quick access to 1110 million data items then hashing is the way to go. But dont reinvent the wheel, use something like:

Source

License : cc by-sa 3.0

http://stackoverflow.com/questions/11765517/java-custom-hash-map-table-some-points

Related

Outils personnels
Espaces de noms

Variantes
Actions
Navigation
Outils