Abstract
Background: Large-scale data has brought more challenges in the aspects of efficient storage and access requirements. Due merely to differences in the programming interface and database schema, the emerging new database cannot replace RDBMS completely. Therefore, in a longer period in the future, schema-free databases that will assist RDBMS to address the access bottleneck is a broad solution of big data access in industry and academia.
Objective: Since schema-free data has the features of high performance and extendibility, it is generally used as the storage of data cache. But there are few effective solutions to keep high cache hit. The frequent access data is not always guaranteed in the cache.
Method: This paper describes Patent Publication Number CN103631972A, titled "Method and System for column-aware data caching", issued by the State Intellectual Property Office of the P.R.C. on December 23, 2013. The caching process includes judging cache hit or miss, updating column access frequency, and change data capture. In order to increase the cache hit rate, the patent is related to cache replacement using column access frequency. There are three circumstances to update column access frequency and maintain cache replacement: transactional updates, non-transactional query, and cache listener. Transactional updates will synchronize the updates of the database to the cache system. Non-transactional query and cache listener will rectify the column access frequency using frequency counter.
Results: There are four results. Firstly, column-aware data caching has the features of low query time and high throughput. Secondly, dynamic cache replacement using column access frequency improves the cache hit rate and guarantees eventual cache consistency. Thirdly, cache listener can clean the expired data to guarantee the hot data in the cache. Finally, this column-aware data caching system is transparent to the developers. Cache consistency in this paper is slightly different from the cache coherency issue in distributed environment.
Conclusion: The idea and a disclosed embodiment of a patent (Patent CN103631972A, issued by the State Intellectual Property Office of the P.R.C.) are presented, which is based on the distribution of cache management system. In one disclosed embodiment, this method contains access judge, frequency counter, change data collector and data cache. The patent's applicability has been illustrated by efficiently solving automatic cache management.
Keywords: Big data, data cache, access frequency, cache hit, cache replacement, cache management.
Graphical Abstract