• Home
  • Alerts
  • About
  • Services
SafeSearch:  On

Download 1p345.pdf

Contents : An Evaluation of Binary XML Encoding Optimizations for Fast Stream Based XML Processing Roberto J. Bayardo IBM Almaden Research Center Daniel Gruhl IBM Almaden Research Center bayardo@alum.mit.edu Vanja Josifovski IBM Almaden Research Center dgruhl@us.ibm.com Jussi Myllymaki IBM Almaden Research Center vanja@us.ibm.com ABSTRACT This paper provides an objective evaluation of the performance impacts of binary XML encodings using a fast stream-based XQuery processor as our representative application. Instead of proposing one binary format and comparing it against standard XML parsers we investigate the individual effects of several binary encoding techniques that are shared by many proposals. Our goal is to provide a deeper understanding of the performance impacts of binary XML encodings in order to clarify the ongoing and often contentious debate over their merits particularly in the domain of high performance XML stream processing. jussi@us.ibm.com In response to such proposals some (e.g. 22 ) have argued that the rush to discard the benefits of the firmly standardized and convenient textual XML representation should not be made without hard compelling evidence as to necessity of such optimizations. Most of the previously cited proposals provide only limited performance studies that fail to precisely quantify the performance gains one might expect in a variety of applications. In this paper we aim to provide an objective evaluation of the effect of "binarization" of XML data with respect to high performance XML stream processing. Rather than propose one single format for evaluation we quantify the individual effects of several typical binary encoding optimizations that can be exploited during XML streaming. XML stream processors such as SAX-based parsers 18 avoid memory management overhead of DOM-based approaches and are thus the method of choice when striving for high performance. By processing the XML stream as events applications can more efficiently capture only the relevant portions of any incoming XML for immediate conversion into an optimized application specific format. This paper specifically examines the effects of optimized binary XML representations on high performance XML-processing applications such as XML database systems 26 and web services residing on a fast network 2 . We assume the infrastructure has suitable network bandwidth and storage such that the bottleneck is the XML processing itself not the delivery of XML over the network or disk subsystem (which have motivated proposals for XMLspecific compression schemes). We thus focus on optimizations that have the potential of providing benefits in this scenario. In situations where the bottleneck may also be network overhead XMLspecific or generic data compression schemes could be applied orthogonally provided the compression scheme is of low overhead. The paper proceeds as follows. We first outline various streambased binary encoding options starting from a "trivial binary" encoding then extending it with other optimizations such as string tokenization and embedded offsets to support rapid identification of document elements that are of interest through random access. The next section discusses a stream-based XQuery processor which we use as the basis of our evaluation. Applications that use streambased XML processing are likely to use XPath or XQuery-like methods (if not an XPath or XQuery processor itself) for extracting the information of interest from the incoming XML. This method thus provides an evaluation that should be indicative of application performance in general. Our evaluation is a carefully crafted "apples to apples" comparison of the fastest XML parser we are aware Categories and Subject Descriptors H.3.4 Systems and Software : Distributed systems information networks Performance evaluation (efficiency and effectivness) General Terms Performance Algorithms Keywords XPath processing XML binary formats 1. INTRODUCTION While XML has arguably overcome questions whether it will succeed as a lingua-franca of data interchange debate continues as to whether XML has a role to play in performance-critical applications such as database systems and high-performance networkavailable services. Concerns about performance of XML-based applications have steadily increased with XML's ubiquity. While these concerns are being addressed in various ways one recurring proposal is to exploit binary serialization formats of the XML document model that are more efficient than the standard textual representation. The topic was in fact the subject of a recent W3C workshop 1 which itself contained half a dozen such proposals. The goals of these proposals typically fall along three dimensions: Some of them attempt to reduce the size overhead of XML data processing by applying XML-specific compression techniques 25 15 17 . Others aim to (also) improve parsing performance or more simply reduce the complexity
  • Rating :      
  • Get Online Jobs!
  • File Type : .pdf
  •    
  • Length : 10 pages
  • File Size: 261.8 kb
  • Virus Tested : No
  • Verified : 2012-12-31
  • Source: www2004.org
 Email File   

INFO HASH : f2afc9e8517cb60b3d432f84ac7f5e85ff63a41a
blog comments powered by Disqus
Download now

File Size: 261.8 kb

Document Preview

    Other Downloads

  • 2p218.pdf198.4 kb
  • atomate-www2010-camera.pdf850 kb
  • rr6711.pdf628.6 kb
  • 152.pdf595.3 kb
  • mp3vpdf.pdf708.3 kb

    Related Keywords

  • docs  proceedings  

  • Add Media
  • |
  • Terms of Use
  • |
  • FAQ / Help

© 2012 all rights reserved