11Introduction
22============
33
4- The original versions of this code up until the time I started were
4+ The original versions of this code, up until the time I started, were
55pretty awesome. You can get a sense of this by running it. For the
6- most part it was remarkably fast, and a single module with few dependencies.
6+ most part, it was remarkably fast and a single module with few dependencies.
77
8- Here I will largely give what are the major improvements over old code.
8+ Here, I will largely describe the major improvements over the old code.
99
1010This also serves to outline a little bit about what is in this code.
1111
@@ -14,39 +14,35 @@ See also `How does this code work? <https://github.com/rocky/python-uncompyle6/w
1414Old Cool Features
1515==================
1616
17- Before getting to the new stuff, I'll describe cool things that was there before.
17+ Before getting to the new stuff, I'll describe cool things that were there before.
1818
1919I particularly liked the ability to show the assembly, grammar
2020reduction rules as they occurred, and the resulting parse tree. It is
2121neat that you could follow the process and steps that deparser takes,
22- and in this not only see the result how the bytecode corresponds to
22+ and in this, not only see the result of how the bytecode corresponds to
2323the resulting source. Compare this with other Python decompilers.
2424
25- And of course also neat was that this used a grammar and table-driven
25+ And of course, also neat was that this used a grammar and table-driven
2626approach to decompile.
2727
2828
2929Expanding decompilation to multiple Python Versions
3030==================================================
3131
3232Aside from ``pycdc ``, most of the Python decompilers handle a small
33- number of Python versions, if they supported more than one. And even
34- when more than one version is supported if you have to be running the
35- Python version that the bytecode was compiled for .
33+ number of Python versions, if they support more than one. And even
34+ when more than one version is supported, you have to be running the
35+ Python version for which the bytecode was compiled.
3636
37- There main reason that you have to be running the Python bytecode
38- interpreter as the one you want to decompile largely stems from the
39- fact that Python's ``dis `` module is often what is used and that has this limitation.
37+ Because Python's ``dis `` module usually requires you to use the Python interpreter for the bytecode you want to decompile.
4038
4139``pycdc `` doesn't suffer this problem because it is written in C++,
4240not Python. Hartmut Goebel's code had provisions for multiple Python
4341versions running from an interpreter different from the one that was
44- running the decompiler. That however used compiled code in the process
45- was tied a bit to the Python C headers for a particular version.
42+ running the decompiler. That, however, used compiled code in the process, which was tied a bit to the Python C headers for a particular version.
4643
47- You need to not only to account for different "marshal" and "unmarshal"
48- routines for the different Python versions, but also, as the Python versions
49- extend, you need a different code type as well.
44+ You need to not only account for different "marshal" and "unmarshal"
45+ routines for the different Python versions, but also, as the Python versions change, you sometimes need a different code type as well.
5046
5147Enter ``xdis ``
5248--------------
@@ -57,8 +53,8 @@ portion and disassembly routines into a separate module,
5753found in newer Pythons, such as parsing the bytecode, a uniform stream
5854of bytes, into a list of structured bytecode instructions.
5955
60- Python 2.7's ``dis `` module doesn't has provide a instruction abstraction.
61- Therefore in ``uncompyle2 `` and other earlier decompilers you see code with magic numbers like 4 in::
56+ Python 2.7's ``dis `` module doesn't have an instruction abstraction.
57+ Therefore, in ``uncompyle2 `` and other earlier decompilers, you see code with magic numbers like 4 in::
6258
6359 if end > jump_back+4 and code[end] in (JF, JA):
6460 if code[jump_back+4] in (JA, JF):
@@ -81,125 +77,120 @@ and in other code -1 and 3 in::
8177 i = jmp + 3
8278
8379All of that offset arithmetic is trying to find the next instruction
84- offset or the previous offset. Using a list of instructions you simply
80+ offset or the previous offset. Using a list of instructions, you simply
8581take the ``offset `` field of the previous or next instruction.
8682
8783The above code appears in the ``uncompyle2 `` "Scanner" class in
88- service of trying to figure out control flow. Note also that there
89- isn't a single comment in there about what specifically it is trying
90- to do, the logic or that would lead one to be confident that this is
84+ service of trying to figure out the control flow. Note also that there isn't a single comment in there about what specifically it is trying
85+ to do, the logic, or that would lead one to be confident that this is
9186correct, let alone assumptions that are needed for this to be true.
9287
93- While this might largely work for Python 2.7, and ``uncompyle2 `` does
94- get control flow wrong sometimes, it is impossible to adapt code for
88+ While this might largely work for Python 2.7, and ``uncompyle2 `` does get the control flow wrong sometimes, it is impossible to adapt the code for
9589other versions of Python.
9690
97- In addition adding an instruction structure, ``xdis `` adds various
91+ In addition to adding an instruction structure, ``xdis `` adds various
9892flags and features that assist in working with instructions. In the
99- example above this replaces code like ``... in (JF, JA) `` which is
93+ example above, this replaces code like ``... in (JF, JA) `` which is
10094some sort of unconditional jump instruction.
10195
102- Although not needed in the decompiler, ``xdis `` also has nicer
96+ Although not needed in the decompiler, ``xdis `` also has a nicer
10397instruction print format. It can show you the bytes as well as the
10498interpreted instructions. It will interpret flag bits and packed
10599structures in operands so you don't have to. It can even do a limited
106- form of inspection at previous instructions to give a more complete
107- description of an operand. For example on ``LOAD_ATTR `` which loads
100+ form of inspection of the previous instructions to give a more complete
101+ description of an operand. For example, on ``LOAD_ATTR `` which loads
108102the attribute of a variable, often the variable name can be found as
109- the previous instruction. When that is the case the disassembler can
103+ the previous instruction. When that is the case, the disassembler can
110104include that in the disassembly display for the ``LOAD_ATTR `` operand.
111105
112106
113107Python Grammar Isolation
114108------------------------
115109
116- If you want to support multiple versions of Python in a manageable way
110+ If you want to support multiple versions of Python in a manageable way,
117111you really need to provide different grammars for the different
118112versions, in a grammar-based system. None of the published versions of
119113this decompiler did this.
120114
121- If you look at the changes in this code, right now there are no
122- grammar changes needed between 1.0 to 1.3. (Some of this may be wrong
123- though since we haven't extensively tested these earliest Python versions
115+ If you look at the changes in this code, right now, there are no
116+ grammar changes needed between 1.0 and 1.3. (Some of this may be wrong since we haven't extensively tested these earliest Python versions
124117
125- For Python 1.4 which is based off of the grammar for 1.5 though there
126- are number of changes, about 6 grammar rules. Later versions of though
127- we start to see larger upheaval and at certain places, especially
118+ For Python 1.4, which is based on the grammar for 1.5, there
119+ are a number of changes, about 6 grammar rules. Later versions of though
120+ we start to see larger upheaval, and at certain places, especially
128121those where new opcodes are introduced, especially those that change
129122the way calls or exceptions get handled, we have major upheaval in the
130123grammar. It is not just that some rules get added, but we also need to
131124*remove * some grammar rules as well.
132125
133126I have been largely managing this as incremental differences between versions.
134- However in the future I am leaning more towards totally separate grammars.
135- A well constructed grammar doesn't need to be that large.
127+ However, in the future, I am leaning more towards totally separate grammars.
128+ A well- constructed grammar doesn't need to be that large.
136129
137130When starting out a new version, we can just copy the grammar from the
138- prior version. Within a Python version though , I am breaking these
139- into composable pieces. In particular the grammar for handling what
140- can appear as the body of a lambda, is a subset of the full Python
131+ prior version. Within a Python version, I am breaking these
132+ into composable pieces. In particular, the grammar for handling what
133+ can appear as the body of a lambda is a subset of the full Python
141134language. The language allowed in an ``eval `` is also a subset of the
142135full Python language, as are what can appear in the various
143136compilation modes like "single" versus "exec".
144137
145- Another nice natural self-contain grammar section is what can appear
138+ Another nice natural self-contained grammar section is what can appear
146139in list comprehensions and generators. The bodies of these are
147140generally represented in a self-contained code block.
148141
149- Often in decompilation you may be interested not just in decompiling
150- the entire code but you may be interested in only focusing on a
151- specific part of the code. And if there is a problem in decompiling
142+ Often in decompilation, you may be interested not just in decompiling
143+ the entire code, but you may be interested in only focusing on a specific part of the code. And if there is a problem in decompiling
152144the entire piece of code, having these smaller breaking points can be
153145of assistance.
154146
155147Other Modularity
156148----------------
157149
158- Above we have mentioned the need for separate grammars or to isolate
159- these per versions . But there are other major pieces that make up this
160- decompiler. In particular there is a scanner and the source code
150+ Above, we have mentioned the need for separate grammars or to isolate
151+ these per version . But there are other major pieces that make up this
152+ decompiler. In particular, there is a scanner and the source code
161153generation part.
162154
163155Even though differences in version that occur in disassembly are
164- handled by ``xdis ``, we still have to do conversion of that to a token
165- stream for parsing. So the scanners are again broken out per version
166- with various OO mechanisms for reusing code. The same is true for
156+ handled by ``xdis ``, we still have to do the conversion of that to a token
157+ stream for parsing. So the scanners are again broken out per version, with various OO mechanisms for reusing code. The same is true for
167158source code generation.
168159
169160
170161Expanding decompiler availability to multiple Python Versions
171162--------------------------------------------------------------
172163
173- Above we mention decompiling multiple versions of bytecode from a
164+ Above, we mentioned decompiling multiple versions of bytecode from a
174165single Python interpreter. We talk about having the decompiler
175- runnable from multiple versions of Python, independent of the set of
166+ runnable from many versions of Python, independent of the set of
176167bytecode that the decompiler supports.
177168
178169
179170There are slight advantages in having a decompiler that runs the same
180171version as the code you are decompiling. The most obvious one is that
181- it makes it easy to test to see whether the decompilation correct
172+ it makes it easy to test to see whether the decompilation is correct
182173because you can run the decompiled code. Python comes with a suite of
183- Python programs that check themselves and that aspects of Python are
174+ Python programs that check themselves and ensure that aspects of Python are
184175implemented correctly. These also make excellent programs to check
185- whether a program has decompiled correctly.
176+ whether a program has been decompiled correctly.
186177
187178Aside from this, debugging can be easier as well. To assist
188- understanding bytecode and single stepping it see `x-python
179+ understanding bytecode and single- stepping it, see `x-python
189180<https://pypi.org/project/x-python/> `_ and the debugger for it
190181`trepan-xpy <https://pypi.org/project/trepanxpy/ >`_.
191182
192183Handling Language Drift
193184-----------------------
194185
195- Given the desirability of having this code running on logs of Python
186+ Given the desirability of having this code running on many Python
196187versions, how can we get this done?
197188
198189The solution used here is to have several git branches of the
199- code. Right now there are 3 branches. Each branch handles works across
200- 3 or so different releases of Python. In particular one branch handles
190+ code. Right now, there are 3 branches. Each branch works across
191+ 3 or so different releases of Python. In particular, one branch handles
201192Python 2.4 to 2.7 Another handles Python 3.3 to 3.5, and the master
202- branch handles 3.6 to 3.10. (Again note that the 3.9 and 3.10
193+ branch handles 3.6 to 3.10. (Again, note that the 3.9 and 3.10
203194decompilers do not decompile Python 3.9 or 3.10, but they do handle
204195bytecode for all earlier versions.)
205196
@@ -211,8 +202,8 @@ Cool features of the Parser
211202* numbering tokens
212203* showing a stack of completions
213204
214- Cool features Semantic Analysis
215- ===============================
205+ Cool features of Semantic Analysis
206+ ===================================
216207
217208* ``--tree++ `` (``-T ``) option
218209* showing precedence
0 commit comments