HBase 0.94 版本最近发布了,距上次的0.92 版本发布又四个月了,下面我们就来看一下这个版本中丰富的功能增强。 英文:
中文:http://blog.nosqlfan.com/html/3991.html
读文件优化: **在HBASE-4465 这个提案中提出了一个叫Lazy Seek的优化,通过先在最近的StoreFile中先读取数据,再看其它StoreFile是否在这个数据操作时间后有操作,如果没有,就不对这个StoreFile进行读取了。这大大减少了对StoreFile的读操作,这一特性也已经是默认开启了。 **Seek optimizations: Till now, if there were several StoreFiles for a column family in a region, HBase would seek in each such files and merge the results, even if the row/column we are looking for is in the most recent file. 在0.94 版本之前,如果一个column family对应了多个StoreFile,那么HBase 会从各个StoreFile中读取数据进行合并,即使我们需要的数据在最近的StoreFile中。 HBASE-4465: “Lazy Seek optimization of StoreFile Scanners” optimizes scanner reads to read the_ most recent_ StoreFile first by lazily seeking the StoreFiles. 在HBASE-4465 这个提案中提出了一个叫Lazy Seek的优化,通过先在最近的StoreFile中先读取数据。This is achieved by introducing a fake keyvalue with its timestamp equal to the maximum timestamp present in the particular StoreFile. 引入一个虚假的keyvalue,该key包含了当前storefile中的最大的时间戳。Thus, a disk seek is avoided until the KeyValueScanner for a StoreFile is bubbled up the heap, implying a need to do a real read operation. This should provide a significant read performance boost, especially for IncrementColumnValue operations where we care only for latest value. This feature is enabled by default.