Lazy parsing of sequence attributes #133

GreyCat · 2017-03-17T12:44:29Z

Sometimes we have some fairly large / complex subtypes in a seq with a known size, which are not really used on every file read, so it's beneficial to read the lazily, i.e. on demand. A simple case:

seq:
  - id: foo
    type: u4
  - id: bar
    size: 100500
    type: something_else
    lazy: true # <= new proposed key
  - id: baz
    type: u4

See #65 (comment) for proposed generation results.

Another, more complex option, is to have lazy seq arrays with fixed size elements and (repeat-expr or repeat-eos) — thus number of elements being known a priori. Probably it's worth discussing after we've handled this simple case.

The text was updated successfully, but these errors were encountered:

KOLANICH · 2017-10-06T11:24:01Z

For python there is a nice module https://github.com/ionelmc/python-lazy-object-proxy which may be useful. But KSC must be altered in order to use it.

But there are obvious drawbacks for lazy parsing: for now we can parse and close the file, lazy parsing means that we cannot close the file until all the needed parts of a struct have been accessed. So there should be some parameter controlling it. I guess that we may need a hook function in runtime where we pass a lambda accessing the property and returning the KS object. Non-lazy one will execute the lambda immediately. Lazy one will call lazy_object_proxy.Proxy.

milahu · 2023-04-02T14:49:24Z

for now we can parse and close the file

what if my file has 100GB, but i have only 1GB ram?
(example: sqlite3 database)

currently kaitai-struct-compiler generates

    @property
    def pages(self):
        if hasattr(self, "_m_pages"):
            return self._m_pages

        self._m_pages = []
        for i in range(self.header.num_pages):
            self._m_pages.append(
                Sqlite3.Page(
                    (i + 1), (self.header.page_size * i), self._io, self, self._root
                )
            )

when it should generate

class Sqlite3(KaitaiStruct):

    class PagesList:
        def __init__(self, root):
            self.root = root

        def __len__(self):
            return self.root.header.num_pages

        def __getitem__(self, i):  # i is 0-based
            if i < 0:  # -1 means last page, etc
                i = self.root.header.num_pages + i

            assert (
                0 <= i and i < self.root.header.num_pages
            ), f"page index is out of range: {i} is not in (0, {self.root.header.num_pages - 1})"

            # TODO LRU cache with sparse array?
            # note: LRU cache does not give pointer equality
            # but equality check is trivial: page_a.page_number == page_b.page_number

            _pos = self.root._io.pos()
            self.root._io.seek(i * self.root.header.page_size)
            page = Sqlite3.Page(
                (i + 1), (self.root.header.page_size * i), self.root._io, self.root, self.root._root
            )
            self.root._io.seek(_pos)
            return page

    def __init__(self, _io, _parent=None, _root=None):
        self._io = _io
        self._parent = _parent
        self._root = _root if _root else self
        # add this line:
        self.pages = Sqlite3.PagesList(self)
        self._read()

for now im patching the generated sqlite3.py parser

see also: Can Kaitai Struct be used to describe TLV data without creating new types for each field? (via kaitai-io/kaitai_struct_formats#661 (comment))

keywords: random access, array, list, nested parsing, deferred parsing

KOLANICH · 2023-04-02T15:46:46Z

what if my file has 100GB, but i have only 1GB ram?

#65 was created with exactly that use case in mind. Not to store all the data in memory, but map the file and store offsets and let the OS to read them only when needed and re-parse tuem again (and in the case of fixed structs - without actually parsing, at least for C++ and Rust, when no serialization is needed). Not very suitable for systems without a MMU, such as Arduino boards. For them it may be possible to generate instances, getting the ranges. But again it is not very suitable to read the whole array into memory, so we need generating an index first, and then reuse it. And we need the runtime to forget the stuff it doesn't need. All these are not so straighforward decisions and I doubt they can be done without consulting to general intelligence. The knowledge pf the right decisions can be incorporated into ksy files in the form of hints #225. Probably there should he a mode for the compiler adding hints stubs (in the form of all suitable hints for the case) into ksy file in the places they are needed. Then a programmer can use a diff tool to review the hints and choose the right ones.

GreyCat added the enhancement label Mar 17, 2017

KOLANICH mentioned this issue May 11, 2017

Document security expectations #170

Open

GreyCat mentioned this issue Sep 13, 2017

Throws an error when I try to _read a large file #255

Closed

GreyCat mentioned this issue Dec 15, 2017

repeat-expr can't use fields defined in instances #298

Closed

KOLANICH mentioned this issue Jan 18, 2018

Cython runtime and compile target #311

Closed

GreyCat mentioned this issue Feb 2, 2018

Container special type #340

Open

GreyCat mentioned this issue Feb 13, 2018

Limit number of selected record in metadata / header kaitai-io/kaitai_struct_webide#60

Open

KOLANICH mentioned this issue Aug 23, 2018

Silent failure when a field references a subsequent field #469

Open

milahu mentioned this issue Apr 2, 2023

sqlite3: fix parser kaitai-io/kaitai_struct_formats#661

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lazy parsing of sequence attributes #133

Lazy parsing of sequence attributes #133

GreyCat commented Mar 17, 2017

KOLANICH commented Oct 6, 2017 •

edited

Loading

milahu commented Apr 2, 2023 •

edited

Loading

KOLANICH commented Apr 2, 2023

Lazy parsing of sequence attributes #133

Lazy parsing of sequence attributes #133

Comments

GreyCat commented Mar 17, 2017

KOLANICH commented Oct 6, 2017 • edited Loading

milahu commented Apr 2, 2023 • edited Loading

KOLANICH commented Apr 2, 2023

KOLANICH commented Oct 6, 2017 •

edited

Loading

milahu commented Apr 2, 2023 •

edited

Loading