Skip to content

Commit 8a81771

Browse files
authored
Update NEW_FEATURES.rst
1 parent e0f1cf4 commit 8a81771

File tree

1 file changed

+56
-65
lines changed

1 file changed

+56
-65
lines changed

NEW_FEATURES.rst

Lines changed: 56 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
Introduction
22
============
33

4-
The original versions of this code up until the time I started were
4+
The original versions of this code, up until the time I started, were
55
pretty awesome. You can get a sense of this by running it. For the
6-
most part it was remarkably fast, and a single module with few dependencies.
6+
most part, it was remarkably fast and a single module with few dependencies.
77

8-
Here I will largely give what are the major improvements over old code.
8+
Here, I will largely describe the major improvements over the old code.
99

1010
This also serves to outline a little bit about what is in this code.
1111

@@ -14,39 +14,35 @@ See also `How does this code work? <https://github.com/rocky/python-uncompyle6/w
1414
Old Cool Features
1515
==================
1616

17-
Before getting to the new stuff, I'll describe cool things that was there before.
17+
Before getting to the new stuff, I'll describe cool things that were there before.
1818

1919
I particularly liked the ability to show the assembly, grammar
2020
reduction rules as they occurred, and the resulting parse tree. It is
2121
neat that you could follow the process and steps that deparser takes,
22-
and in this not only see the result how the bytecode corresponds to
22+
and in this, not only see the result of how the bytecode corresponds to
2323
the resulting source. Compare this with other Python decompilers.
2424

25-
And of course also neat was that this used a grammar and table-driven
25+
And of course, also neat was that this used a grammar and table-driven
2626
approach to decompile.
2727

2828

2929
Expanding decompilation to multiple Python Versions
3030
==================================================
3131

3232
Aside from ``pycdc``, most of the Python decompilers handle a small
33-
number of Python versions, if they supported more than one. And even
34-
when more than one version is supported if you have to be running the
35-
Python version that the bytecode was compiled for.
33+
number of Python versions, if they support more than one. And even
34+
when more than one version is supported, you have to be running the
35+
Python version for which the bytecode was compiled.
3636

37-
There main reason that you have to be running the Python bytecode
38-
interpreter as the one you want to decompile largely stems from the
39-
fact that Python's ``dis`` module is often what is used and that has this limitation.
37+
Because Python's ``dis`` module usually requires you to use the Python interpreter for the bytecode you want to decompile.
4038

4139
``pycdc`` doesn't suffer this problem because it is written in C++,
4240
not Python. Hartmut Goebel's code had provisions for multiple Python
4341
versions running from an interpreter different from the one that was
44-
running the decompiler. That however used compiled code in the process
45-
was tied a bit to the Python C headers for a particular version.
42+
running the decompiler. That, however, used compiled code in the process, which was tied a bit to the Python C headers for a particular version.
4643

47-
You need to not only to account for different "marshal" and "unmarshal"
48-
routines for the different Python versions, but also, as the Python versions
49-
extend, you need a different code type as well.
44+
You need to not only account for different "marshal" and "unmarshal"
45+
routines for the different Python versions, but also, as the Python versions change, you sometimes need a different code type as well.
5046

5147
Enter ``xdis``
5248
--------------
@@ -57,8 +53,8 @@ portion and disassembly routines into a separate module,
5753
found in newer Pythons, such as parsing the bytecode, a uniform stream
5854
of bytes, into a list of structured bytecode instructions.
5955

60-
Python 2.7's ``dis`` module doesn't has provide a instruction abstraction.
61-
Therefore in ``uncompyle2`` and other earlier decompilers you see code with magic numbers like 4 in::
56+
Python 2.7's ``dis`` module doesn't have an instruction abstraction.
57+
Therefore, in ``uncompyle2`` and other earlier decompilers, you see code with magic numbers like 4 in::
6258

6359
if end > jump_back+4 and code[end] in (JF, JA):
6460
if code[jump_back+4] in (JA, JF):
@@ -81,125 +77,120 @@ and in other code -1 and 3 in::
8177
i = jmp + 3
8278

8379
All of that offset arithmetic is trying to find the next instruction
84-
offset or the previous offset. Using a list of instructions you simply
80+
offset or the previous offset. Using a list of instructions, you simply
8581
take the ``offset`` field of the previous or next instruction.
8682

8783
The above code appears in the ``uncompyle2`` "Scanner" class in
88-
service of trying to figure out control flow. Note also that there
89-
isn't a single comment in there about what specifically it is trying
90-
to do, the logic or that would lead one to be confident that this is
84+
service of trying to figure out the control flow. Note also that there isn't a single comment in there about what specifically it is trying
85+
to do, the logic, or that would lead one to be confident that this is
9186
correct, let alone assumptions that are needed for this to be true.
9287

93-
While this might largely work for Python 2.7, and ``uncompyle2`` does
94-
get control flow wrong sometimes, it is impossible to adapt code for
88+
While this might largely work for Python 2.7, and ``uncompyle2`` does get the control flow wrong sometimes, it is impossible to adapt the code for
9589
other versions of Python.
9690

97-
In addition adding an instruction structure, ``xdis`` adds various
91+
In addition to adding an instruction structure, ``xdis`` adds various
9892
flags and features that assist in working with instructions. In the
99-
example above this replaces code like ``... in (JF, JA)`` which is
93+
example above, this replaces code like ``... in (JF, JA)`` which is
10094
some sort of unconditional jump instruction.
10195

102-
Although not needed in the decompiler, ``xdis`` also has nicer
96+
Although not needed in the decompiler, ``xdis`` also has a nicer
10397
instruction print format. It can show you the bytes as well as the
10498
interpreted instructions. It will interpret flag bits and packed
10599
structures in operands so you don't have to. It can even do a limited
106-
form of inspection at previous instructions to give a more complete
107-
description of an operand. For example on ``LOAD_ATTR`` which loads
100+
form of inspection of the previous instructions to give a more complete
101+
description of an operand. For example, on ``LOAD_ATTR`` which loads
108102
the attribute of a variable, often the variable name can be found as
109-
the previous instruction. When that is the case the disassembler can
103+
the previous instruction. When that is the case, the disassembler can
110104
include that in the disassembly display for the ``LOAD_ATTR`` operand.
111105

112106

113107
Python Grammar Isolation
114108
------------------------
115109

116-
If you want to support multiple versions of Python in a manageable way
110+
If you want to support multiple versions of Python in a manageable way,
117111
you really need to provide different grammars for the different
118112
versions, in a grammar-based system. None of the published versions of
119113
this decompiler did this.
120114

121-
If you look at the changes in this code, right now there are no
122-
grammar changes needed between 1.0 to 1.3. (Some of this may be wrong
123-
though since we haven't extensively tested these earliest Python versions
115+
If you look at the changes in this code, right now, there are no
116+
grammar changes needed between 1.0 and 1.3. (Some of this may be wrong since we haven't extensively tested these earliest Python versions
124117

125-
For Python 1.4 which is based off of the grammar for 1.5 though there
126-
are number of changes, about 6 grammar rules. Later versions of though
127-
we start to see larger upheaval and at certain places, especially
118+
For Python 1.4, which is based on the grammar for 1.5, there
119+
are a number of changes, about 6 grammar rules. Later versions of though
120+
we start to see larger upheaval, and at certain places, especially
128121
those where new opcodes are introduced, especially those that change
129122
the way calls or exceptions get handled, we have major upheaval in the
130123
grammar. It is not just that some rules get added, but we also need to
131124
*remove* some grammar rules as well.
132125

133126
I have been largely managing this as incremental differences between versions.
134-
However in the future I am leaning more towards totally separate grammars.
135-
A well constructed grammar doesn't need to be that large.
127+
However, in the future, I am leaning more towards totally separate grammars.
128+
A well-constructed grammar doesn't need to be that large.
136129

137130
When starting out a new version, we can just copy the grammar from the
138-
prior version. Within a Python version though, I am breaking these
139-
into composable pieces. In particular the grammar for handling what
140-
can appear as the body of a lambda, is a subset of the full Python
131+
prior version. Within a Python version, I am breaking these
132+
into composable pieces. In particular, the grammar for handling what
133+
can appear as the body of a lambda is a subset of the full Python
141134
language. The language allowed in an ``eval`` is also a subset of the
142135
full Python language, as are what can appear in the various
143136
compilation modes like "single" versus "exec".
144137

145-
Another nice natural self-contain grammar section is what can appear
138+
Another nice natural self-contained grammar section is what can appear
146139
in list comprehensions and generators. The bodies of these are
147140
generally represented in a self-contained code block.
148141

149-
Often in decompilation you may be interested not just in decompiling
150-
the entire code but you may be interested in only focusing on a
151-
specific part of the code. And if there is a problem in decompiling
142+
Often in decompilation, you may be interested not just in decompiling
143+
the entire code, but you may be interested in only focusing on a specific part of the code. And if there is a problem in decompiling
152144
the entire piece of code, having these smaller breaking points can be
153145
of assistance.
154146

155147
Other Modularity
156148
----------------
157149

158-
Above we have mentioned the need for separate grammars or to isolate
159-
these per versions. But there are other major pieces that make up this
160-
decompiler. In particular there is a scanner and the source code
150+
Above, we have mentioned the need for separate grammars or to isolate
151+
these per version. But there are other major pieces that make up this
152+
decompiler. In particular, there is a scanner and the source code
161153
generation part.
162154

163155
Even though differences in version that occur in disassembly are
164-
handled by ``xdis``, we still have to do conversion of that to a token
165-
stream for parsing. So the scanners are again broken out per version
166-
with various OO mechanisms for reusing code. The same is true for
156+
handled by ``xdis``, we still have to do the conversion of that to a token
157+
stream for parsing. So the scanners are again broken out per version, with various OO mechanisms for reusing code. The same is true for
167158
source code generation.
168159

169160

170161
Expanding decompiler availability to multiple Python Versions
171162
--------------------------------------------------------------
172163

173-
Above we mention decompiling multiple versions of bytecode from a
164+
Above, we mentioned decompiling multiple versions of bytecode from a
174165
single Python interpreter. We talk about having the decompiler
175-
runnable from multiple versions of Python, independent of the set of
166+
runnable from many versions of Python, independent of the set of
176167
bytecode that the decompiler supports.
177168

178169

179170
There are slight advantages in having a decompiler that runs the same
180171
version as the code you are decompiling. The most obvious one is that
181-
it makes it easy to test to see whether the decompilation correct
172+
it makes it easy to test to see whether the decompilation is correct
182173
because you can run the decompiled code. Python comes with a suite of
183-
Python programs that check themselves and that aspects of Python are
174+
Python programs that check themselves and ensure that aspects of Python are
184175
implemented correctly. These also make excellent programs to check
185-
whether a program has decompiled correctly.
176+
whether a program has been decompiled correctly.
186177

187178
Aside from this, debugging can be easier as well. To assist
188-
understanding bytecode and single stepping it see `x-python
179+
understanding bytecode and single-stepping it, see `x-python
189180
<https://pypi.org/project/x-python/>`_ and the debugger for it
190181
`trepan-xpy <https://pypi.org/project/trepanxpy/>`_.
191182

192183
Handling Language Drift
193184
-----------------------
194185

195-
Given the desirability of having this code running on logs of Python
186+
Given the desirability of having this code running on many Python
196187
versions, how can we get this done?
197188

198189
The solution used here is to have several git branches of the
199-
code. Right now there are 3 branches. Each branch handles works across
200-
3 or so different releases of Python. In particular one branch handles
190+
code. Right now, there are 3 branches. Each branch works across
191+
3 or so different releases of Python. In particular, one branch handles
201192
Python 2.4 to 2.7 Another handles Python 3.3 to 3.5, and the master
202-
branch handles 3.6 to 3.10. (Again note that the 3.9 and 3.10
193+
branch handles 3.6 to 3.10. (Again, note that the 3.9 and 3.10
203194
decompilers do not decompile Python 3.9 or 3.10, but they do handle
204195
bytecode for all earlier versions.)
205196

@@ -211,8 +202,8 @@ Cool features of the Parser
211202
* numbering tokens
212203
* showing a stack of completions
213204

214-
Cool features Semantic Analysis
215-
===============================
205+
Cool features of Semantic Analysis
206+
===================================
216207

217208
* ``--tree++`` (``-T``) option
218209
* showing precedence

0 commit comments

Comments
 (0)