Library reference¶
Python ASN.1 DER/CER/BER codec with abstract structures
This library allows you to marshal various structures in ASN.1 DER/CER format, unmarshal BER/CER/DER ones.
>>> i = Integer(123)
>>> raw = i.encode()
>>> Integer().decod(raw) == i
True
There are primitive types, holding single values
(pyderasn.BitString
,
pyderasn.Boolean
,
pyderasn.Enumerated
,
pyderasn.GeneralizedTime
,
pyderasn.Integer
,
pyderasn.Null
,
pyderasn.ObjectIdentifier
,
pyderasn.OctetString
,
pyderasn.UTCTime
,
various strings
(pyderasn.BMPString
,
pyderasn.GeneralString
,
pyderasn.GraphicString
,
pyderasn.IA5String
,
pyderasn.ISO646String
,
pyderasn.NumericString
,
pyderasn.PrintableString
,
pyderasn.T61String
,
pyderasn.TeletexString
,
pyderasn.UniversalString
,
pyderasn.UTF8String
,
pyderasn.VideotexString
,
pyderasn.VisibleString
)),
constructed types, holding multiple primitive types
(pyderasn.Sequence
,
pyderasn.SequenceOf
,
pyderasn.Set
,
pyderasn.SetOf
),
and special types like
pyderasn.Any
and
pyderasn.Choice
.
Common for most types¶
Default/optional¶
Many objects in sequences could be OPTIONAL
and could have
DEFAULT
value. You can specify that object’s property using
corresponding keyword arguments.
>>> Integer(optional=True, default=123)
INTEGER 123 OPTIONAL DEFAULT
Those specifications do not play any role in primitive value encoding,
but are taken into account when dealing with sequences holding them. For
example TBSCertificate
sequence holds defaulted, explicitly tagged
version
field:
class Version(Integer):
schema = (
("v1", 0),
("v2", 1),
("v3", 2),
)
class TBSCertificate(Sequence):
schema = (
("version", Version(expl=tag_ctxc(0), default="v1")),
[...]
When default argument is used and value is not specified, then it equals to default one.
Size constraints¶
Some objects give ability to set value size constraints. This is either possible integer value, or allowed length of various strings and sequences. Constraints are set in the following way:
class X(...):
bounds = (MIN, MAX)
And values satisfaction is checked as: MIN <= X <= MAX
.
For simplicity you can also set bounds the following way:
bounded_x = X(bounds=(MIN, MAX))
If bounds are not satisfied, then pyderasn.BoundsError
is
raised.
Common methods¶
All objects have ready
boolean property, that tells if object is
ready to be encoded. If that kind of action is performed on unready
object, then pyderasn.ObjNotReady
exception will be raised.
All objects are friendly to copy.copy()
and copied objects can be
safely mutated.
Also all objects can be safely pickle
-d, but pay attention that
pickling among different PyDERASN versions is prohibited.
Decoding¶
Decoding is performed using pyderasn.Obj.decode()
method.
offset
optional argument could be used to set initial object’s
offset in the binary data, for convenience. It returns decoded object
and remaining unmarshalled data (tail). Internally all work is done on
memoryview(data)
, and you can leave returning tail as a memoryview,
by specifying leavemm=True
argument.
Also note convenient pyderasn.Obj.decod()
method, that
immediately checks and raises if there is non-empty tail.
When object is decoded, decoded
property is true and you can safely
use following properties:
offset
– position including initial offset where object’s tag startstlen
– length of object’s tagllen
– length of object’s length valuevlen
– length of object’s valuetlvlen
– length of the whole object
Pay attention that those values do not include anything related to explicit tag. If you want to know information about it, then use:
expled
– to know if explicit tag is setexpl_offset
(it is lesser thanoffset
)expl_tlen
,expl_llen
expl_vlen
(that actually equals to ordinarytlvlen
)fulloffset
– it equals toexpl_offset
if explicit tag is set,offset
otherwisefulllen
– it equals toexpl_len
if explicit tag is set,tlvlen
otherwise
When error occurs, pyderasn.DecodeError
is raised.
Context¶
You can specify so called context keyword argument during
pyderasn.Obj.decode()
invocation. It is dictionary containing
various options governing decoding process.
Currently available context options:
Pretty printing¶
All objects have pps()
method, that is a generator of
pyderasn.PP
namedtuple, holding various raw information
about the object. If pps
is called on sequences, then all underlying
PP
will be yielded.
You can use pyderasn.pp_console_row()
function, converting
those PP
to human readable string. Actually exactly it is used for
all object repr
. But it is easy to write custom formatters.
>>> from pyderasn import pprint
>>> encoded = Integer(-12345).encode()
>>> obj, tail = Integer().decode(encoded)
>>> print(pprint(obj))
0 [1,1, 2] INTEGER -12345
Example certificate:
>>> print(pprint(crt))
0 [1,3,1604] Certificate SEQUENCE
4 [1,3,1453] . tbsCertificate: TBSCertificate SEQUENCE
10-2 [1,1, 1] . . version: [0] EXPLICIT Version INTEGER v3 OPTIONAL
13 [1,1, 3] . . serialNumber: CertificateSerialNumber INTEGER 61595
18 [1,1, 13] . . signature: AlgorithmIdentifier SEQUENCE
20 [1,1, 9] . . . algorithm: OBJECT IDENTIFIER 1.2.840.113549.1.1.5
31 [0,0, 2] . . . parameters: [UNIV 5] ANY OPTIONAL
. . . . 05:00
33 [0,0, 278] . . issuer: Name CHOICE rdnSequence
33 [1,3, 274] . . . rdnSequence: RDNSequence SEQUENCE OF
37 [1,1, 11] . . . . 0: RelativeDistinguishedName SET OF
39 [1,1, 9] . . . . . 0: AttributeTypeAndValue SEQUENCE
41 [1,1, 3] . . . . . . type: AttributeType OBJECT IDENTIFIER 2.5.4.6
46 [0,0, 4] . . . . . . value: [UNIV 19] AttributeValue ANY
. . . . . . . 13:02:45:53
[...]
1461 [1,1, 13] . signatureAlgorithm: AlgorithmIdentifier SEQUENCE
1463 [1,1, 9] . . algorithm: OBJECT IDENTIFIER 1.2.840.113549.1.1.5
1474 [0,0, 2] . . parameters: [UNIV 5] ANY OPTIONAL
. . . 05:00
1476 [1,2, 129] . signatureValue: BIT STRING 1024 bits
. . 68:EE:79:97:97:DD:3B:EF:16:6A:06:F2:14:9A:6E:CD
. . 9E:12:F7:AA:83:10:BD:D1:7C:98:FA:C7:AE:D4:0E:2C
[...]
Trailing data: 0a
Let’s parse that output, human:
10-2 [1,1, 1] . . version: [0] EXPLICIT Version INTEGER v3 OPTIONAL
^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
0 1 2 3 4 5 6 7 8 9 10 11
20 [1,1, 9] . . . algorithm: OBJECT IDENTIFIER 1.2.840.113549.1.1.5
^ ^ ^ ^ ^ ^ ^ ^
0 2 3 4 5 6 9 10
33 [0,0, 278] . . issuer: Name CHOICE rdnSequence
^ ^ ^ ^ ^ ^ ^ ^ ^
0 2 3 4 5 6 8 9 10
52-2∞ B [1,1,1054]∞ . . . . eContent: [0] EXPLICIT BER OCTET STRING 1046 bytes
^ ^ ^ ^ ^
12 13 14 9 10
- 0:
Offset of the object, where its DER/BER encoding begins. Pay attention that it does not include explicit tag.
- 1:
If explicit tag exists, then this is its length (tag + encoded length).
- 2:
Length of object’s tag. For example CHOICE does not have its own tag, so it is zero.
- 3:
Length of encoded length.
- 4:
Length of encoded value.
- 5:
Visual indentation to show the depth of object in the hierarchy.
- 6:
Object’s name inside SEQUENCE/CHOICE.
- 7:
If either IMPLICIT or EXPLICIT tag is set, then it will be shown here. “IMPLICIT” is omitted.
- 8:
Object’s class name, if set. Omitted if it is just an ordinary simple value (like with
algorithm
in example above).- 9:
Object’s ASN.1 type.
- 10:
Object’s value, if set. Can consist of multiple words (like OCTET/BIT STRINGs above). We see
v3
value in Version, because it is named.rdnSequence
is the choice of CHOICE type.- 11:
Possible other flags like OPTIONAL and DEFAULT, if value equals to the default one, specified in the schema.
- 12:
Shows does object contains any kind of BER encoded data (possibly Sequence holding BER-encoded underlying value).
- 13:
Only applicable to BER encoded data. Indefinite length encoding mark.
- 14:
Only applicable to BER encoded data. If object has BER-specific encoding, then
BER
will be shown. It does not depend on indefinite length encoding.EOC
,BOOLEAN
,BIT STRING
,OCTET STRING
(and its derivatives),SET
,SET OF
,UTCTime
,GeneralizedTime
could be BERed.
Also it could be helpful to add quick ASN.1 pprinting command in your pdb’s configuration file:
alias pp1 import pyderasn ;; print(pyderasn.pprint(%1, oid_maps=(locals().get("OID_STR_TO_NAME", {}),)))
DEFINED BY¶
ASN.1 structures often have ANY and OCTET STRING fields, that are DEFINED BY some previously met ObjectIdentifier. This library provides ability to specify mapping between some OID and field that must be decoded with specific specification.
defines kwarg¶
pyderasn.ObjectIdentifier
field inside
pyderasn.Sequence
can hold mapping between OIDs and
necessary for decoding structures. For example, CMS (RFC 5652)
container:
class ContentInfo(Sequence):
schema = (
("contentType", ContentType(defines=((("content",), {
id_digestedData: DigestedData(),
id_signedData: SignedData(),
}),))),
("content", Any(expl=tag_ctxc(0))),
)
contentType
field tells that it defines that content
must be
decoded with SignedData
specification, if contentType
equals to
id-signedData
. The same applies to DigestedData
. If
contentType
contains unknown OID, then no automatic decoding is
done.
You can specify multiple fields, that will be autodecoded – that is why
defines
kwarg is a sequence. You can specify defined field
relatively or absolutely to current decode path. For example defines
for AlgorithmIdentifier of X.509’s
tbsCertificate:subjectPublicKeyInfo:algorithm:algorithm
:
(
(("parameters",), {
id_ecPublicKey: ECParameters(),
id_GostR3410_2001: GostR34102001PublicKeyParameters(),
}),
(("..", "subjectPublicKey"), {
id_rsaEncryption: RSAPublicKey(),
id_GostR3410_2001: OctetString(),
}),
),
tells that if certificate’s SPKI algorithm is GOST R 34.10-2001, then autodecode its parameters inside SPKI’s algorithm and its public key itself.
Following types can be automatically decoded (DEFINED BY):
pyderasn.BitString
(that is multiple of 8 bits)pyderasn.SequenceOf
/pyderasn.SetOf
Any
/BitString
/OctetString
-s
When any of those fields is automatically decoded, then .defined
attribute contains (OID, value)
tuple. OID
tells by which OID it
was defined, value
contains corresponding decoded value. For example
above, content_info["content"].defined == (id_signedData, signed_data)
.
defines_by_path context option¶
Sometimes you either can not or do not want to explicitly set defines
in the schema. You can dynamically apply those definitions when calling
pyderasn.Obj.decode()
method.
Specify defines_by_path
key in the decode context. Its
value must be sequence of following tuples:
(decode_path, defines)
where decode_path
is a tuple holding so-called decode path to the
exact pyderasn.ObjectIdentifier
field you want to apply
defines
, holding exactly the same value as accepted in its
keyword argument.
For example, again for CMS, you want to automatically decode
SignedData
and CMC’s (RFC 5272) PKIData
and PKIResponse
structures it may hold. Also, automatically decode controlSequence
of PKIResponse
:
content_info = ContentInfo().decod(data, ctx={"defines_by_path": (
(
("contentType",),
((("content",), {id_signedData: SignedData()}),),
),
(
(
"content",
DecodePathDefBy(id_signedData),
"encapContentInfo",
"eContentType",
),
((("eContent",), {
id_cct_PKIData: PKIData(),
id_cct_PKIResponse: PKIResponse(),
})),
),
(
(
"content",
DecodePathDefBy(id_signedData),
"encapContentInfo",
"eContent",
DecodePathDefBy(id_cct_PKIResponse),
"controlSequence",
any,
"attrType",
),
((("attrValues",), {
id_cmc_recipientNonce: RecipientNonce(),
id_cmc_senderNonce: SenderNonce(),
id_cmc_statusInfoV2: CMCStatusInfoV2(),
id_cmc_transactionId: TransactionId(),
})),
),
)})
Pay attention for pyderasn.DecodePathDefBy
and any
.
First function is useful for path construction when some automatic
decoding is already done. any
means literally any value it meet –
useful for SEQUENCE/SET OF-s.
BER encoding¶
By default PyDERASN accepts only DER encoded data. By default it encodes
to DER. But you can optionally enable BER decoding with setting
bered
context argument to True. Indefinite lengths and
constructed primitive types should be parsed successfully.
If object is encoded in BER form (not the DER one), then
ber_encoded
attribute is set to True. OnlyBOOLEAN
,BIT STRING
,OCTET STRING
,OBJECT IDENTIFIER
,SEQUENCE
,SET
,SET OF
,UTCTime
,GeneralizedTime
can contain it.If object has an indefinite length encoding, then its
lenindef
attribute is set to True. OnlyBIT STRING
,OCTET STRING
,SEQUENCE
,SET
,SEQUENCE OF
,SET OF
,ANY
can contain it.If object has an indefinite length encoded explicit tag, then
expl_lenindef
is set to True.If object has either any of BER-related encoding (explicit tag indefinite length, object’s indefinite length, BER-encoding) or any underlying component has that kind of encoding, then
bered
attribute is set to True. For example SignedData CMS can haveContentInfo:content:signerInfos:*
bered
value set to True, butContentInfo:content:signerInfos:*:signedAttrs
won’t.
EOC (end-of-contents) token’s length is taken in advance in object’s value length.
Allow explicit tag out-of-bound¶
Invalid BER encoding could contain EXPLICIT
tag containing more than
one value, more than one object. If you set allow_expl_oob
context
option to True, then no error will be raised and that invalid encoding
will be silently further processed. But pay attention that offsets and
lengths will be invalid in that case.
Warning
This option should be used only for skipping some decode errors, just to see the decoded structure somehow.
Streaming and dealing with huge structures¶
evgen mode¶
ASN.1 structures can be huge, they can hold millions of objects inside (for example Certificate Revocation Lists (CRL), holding revocation state for every previously issued X.509 certificate). CACert.org’s 8 MiB CRL file takes more than half a gigabyte of memory to hold the decoded structure.
If you just simply want to check the signature over the tbsCertList
,
you can create specialized schema with that field represented as
OctetString for example:
class TBSCertListFast(Sequence):
schema = (
[...]
("revokedCertificates", OctetString(
impl=SequenceOf.tag_default,
optional=True,
)),
[...]
)
This allows you to quickly decode a few fields and check the signature
over the tbsCertList
bytes.
But how can you get all certificate’s serial number from it, after you
trust that CRL after signature validation? You can use so called
evgen
(event generation) mode, to catch the events/facts of some
successful object decoding. Let’s use command line capabilities:
$ python -m pyderasn --schema tests.test_crl:CertificateList --evgen revoke.crl
10 [1,1, 1] . . version: Version INTEGER v2 (01) OPTIONAL
15 [1,1, 9] . . . algorithm: OBJECT IDENTIFIER 1.2.840.113549.1.1.13
26 [0,0, 2] . . . parameters: [UNIV 5] ANY OPTIONAL
13 [1,1, 13] . . signature: AlgorithmIdentifier SEQUENCE
34 [1,1, 3] . . . . . . type: AttributeType OBJECT IDENTIFIER 2.5.4.10
39 [0,0, 9] . . . . . . value: [UNIV 19] AttributeValue ANY
32 [1,1, 14] . . . . . 0: AttributeTypeAndValue SEQUENCE
30 [1,1, 16] . . . . 0: RelativeDistinguishedName SET OF
[...]
188 [1,1, 1] . . . . userCertificate: CertificateSerialNumber INTEGER 17 (11)
191 [1,1, 13] . . . . . utcTime: UTCTime UTCTime 2003-04-01T14:25:08
191 [0,0, 15] . . . . revocationDate: Time CHOICE utcTime
191 [1,1, 13] . . . . . utcTime: UTCTime UTCTime 2003-04-01T14:25:08
186 [1,1, 18] . . . 0: RevokedCertificate SEQUENCE
208 [1,1, 1] . . . . userCertificate: CertificateSerialNumber INTEGER 20 (14)
211 [1,1, 13] . . . . . utcTime: UTCTime UTCTime 2002-10-01T02:18:01
211 [0,0, 15] . . . . revocationDate: Time CHOICE utcTime
211 [1,1, 13] . . . . . utcTime: UTCTime UTCTime 2002-10-01T02:18:01
206 [1,1, 18] . . . 1: RevokedCertificate SEQUENCE
[...]
9144992 [0,0, 15] . . . . revocationDate: Time CHOICE utcTime
9144992 [1,1, 13] . . . . . utcTime: UTCTime UTCTime 2020-02-08T07:25:06
9144985 [1,1, 20] . . . 415755: RevokedCertificate SEQUENCE
181 [1,4,9144821] . . revokedCertificates: RevokedCertificates SEQUENCE OF OPTIONAL
5 [1,4,9144997] . tbsCertList: TBSCertList SEQUENCE
9145009 [1,1, 9] . . algorithm: OBJECT IDENTIFIER 1.2.840.113549.1.1.13
9145020 [0,0, 2] . . parameters: [UNIV 5] ANY OPTIONAL
9145007 [1,1, 13] . signatureAlgorithm: AlgorithmIdentifier SEQUENCE
9145022 [1,3, 513] . signatureValue: BIT STRING 4096 bits
0 [1,4,9145534] CertificateList SEQUENCE
Here we see how decoder works: it decodes SEQUENCE’s tag, length, then
decodes underlying values. It can not tell if SEQUENCE is decoded, so
the event of the upper level SEQUENCE is the last one we see.
version
field is just a single INTEGER – it is decoded and event is
fired immediately. Then we see that algorithm
and parameters
fields are decoded and only after them the signature
SEQUENCE is
fired as a successfully decoded. There are 4 events for each revoked
certificate entry in that CRL: userCertificate
serial number,
utcTime
of revocationDate
CHOICE, RevokedCertificate
itself
as a one of entity in revokedCertificates
SEQUENCE OF.
We can do that in our ordinary Python code and understand where we are
by looking at deterministically generated decode paths (do not forget
about useful --print-decode-path
CLI option). We must use
pyderasn.Obj.decode_evgen()
method, instead of ordinary
pyderasn.Obj.decode()
. It is generator yielding (decode_path,
obj, tail)
tuples:
for decode_path, obj, _ in CertificateList().decode_evgen(crl_raw):
if (
len(decode_path) == 4 and
decode_path[:2] == ("tbsCertList", "revokedCertificates"),
decode_path[3] == "userCertificate"
):
print("serial number:", int(obj))
Virtually it does not take any memory except at least needed for single
object storage. You can easily use that mode to determine required
object .offset
and .*len
to be able to decode it separately, or
maybe verify signature upon it just by taking bytes by .offset
and
.tlvlen
.
evgen_mode_upto¶
There is full ability to get any kind of data from the CRL in the
example above. However it is not too convenient to get the whole
RevokedCertificate
structure, that is pretty lightweight and one may
do not want to disassemble it. You can use evgen_mode_upto
ctx option that semantically equals to
defines_by_path – list of decode paths
mapped to any non-None value. If specified decode path is met, then any
subsequent objects won’t be decoded in evgen mode. That allows us to
parse the CRL above with fully assembled RevokedCertificate
:
for decode_path, obj, _ in CertificateList().decode_evgen(
crl_raw,
ctx={"evgen_mode_upto": (
(("tbsCertList", "revokedCertificates", any), True),
)},
):
if (
len(decode_path) == 3 and
decode_path[:2] == ("tbsCertList", "revokedCertificates"),
):
print("serial number:", int(obj["userCertificate"]))
Note
SEQUENCE/SET values with DEFAULT specified are automatically decoded without evgen mode.
mmap-ed file¶
POSIX compliant systems have mmap
syscall, giving ability to work
the memory mapped file. You can deal with the file like it was an
ordinary binary string, allowing you not to load it to the memory first.
Also you can use them as an input for OCTET STRING, taking no Python
memory for their storage.
There is convenient pyderasn.file_mmaped()
function that
creates read-only memoryview on the file contents:
with open("huge", "rb") as fd:
raw = file_mmaped(fd)
obj = Something.decode(raw)
Warning
mmap maps the whole file. So it plays no role if you seek-ed it before. Take the slice of the resulting memoryview with required offset instead.
Note
If you use ZFS as underlying storage, then pay attention that currently most platforms does not deal good with ZFS ARC and ordinary page cache used for mmaps. It can take twice the necessary size in the memory: both in page cache and ZFS ARC.
That read-only memoryview could be safe to be used as a value inside
decoded pyderasn.OctetString
and pyderasn.Any
objects. You can enable that by setting “keep_memoryview”: True in
decode context. No OCTET STRING and ANY values will be
copied to memory. Of course that works only in DER encoding, where the
value is continuously encoded.
CER encoding¶
We can parse any kind of data now, but how can we produce files streamingly, without storing their encoded representation in memory? SEQUENCE by default encodes in memory all its values, joins them in huge binary string, just to know the exact size of SEQUENCE’s value for encoding it in TLV. DER requires you to know all exact sizes of the objects.
You can use CER encoding mode, that slightly differs from the DER, but
does not require exact sizes knowledge, allowing streaming encoding
directly to some writer/buffer. Just use
pyderasn.Obj.encode_cer()
method, providing the writer where
encoded data will flow:
with open("result", "wb") as fd:
obj.encode_cer(fd.write)
buf = io.BytesIO()
obj.encode_cer(buf.write)
If you do not want to create in-memory buffer every time, then you can
use pyderasn.encode_cer()
function:
data = encode_cer(obj)
Remember that CER is not valid DER in most cases, so you have to use bered ctx option during its decoding. Also currently there is no validation that provided CER is valid one – you are sure that it has only valid BER encoding.
Warning
SET OF values can not be streamingly encoded, because they are required to be sorted byte-by-byte. Big SET OF values still will take much memory. Use neither SET nor SET OF values, as modern ASN.1 also recommends too.
Do not forget about using mmap-ed memoryviews for your OCTET STRINGs! They will be streamingly copied from underlying file to the buffer using 1 KB chunks.
Some structures require that some of the elements have to be forcefully
DER encoded. For example SignedData
CMS requires you to encode
SignedAttributes
and X.509 certificates in DER form, allowing you to
encode everything else in BER. You can tell any of the structures to be
forcefully encoded in DER during CER encoding, by specifying
der_forced=True
attribute:
class Certificate(Sequence):
schema = (...)
der_forced = True
class SignedAttributes(SetOf):
schema = Attribute()
bounds = (1, float("+inf"))
der_forced = True
agg_octet_string¶
In most cases, huge quantity of binary data is stored as OCTET STRING. CER encoding splits it on 1 KB chunks. BER allows splitting on various levels of chunks inclusion:
SOME STRING[CONSTRUCTED]
OCTET STRING[CONSTRUCTED]
OCTET STRING[PRIMITIVE]
DATA CHUNK
OCTET STRING[PRIMITIVE]
DATA CHUNK
OCTET STRING[PRIMITIVE]
DATA CHUNK
OCTET STRING[PRIMITIVE]
DATA CHUNK
OCTET STRING[CONSTRUCTED]
OCTET STRING[PRIMITIVE]
DATA CHUNK
OCTET STRING[PRIMITIVE]
DATA CHUNK
OCTET STRING[CONSTRUCTED]
OCTET STRING[CONSTRUCTED]
OCTET STRING[PRIMITIVE]
DATA CHUNK
You can not just take the offset and some .vlen
of the STRING and
treat it as the payload. If you decode it without
evgen mode, then it will be automatically aggregated
and bytes()
will give the whole payload contents.
You are forced to use evgen mode for decoding for
small memory footprint. There is convenient
pyderasn.agg_octet_string()
helper for reconstructing the
payload. Let’s assume you have got BER/CER encoded ContentInfo
with
huge SignedData
and EncapsulatedContentInfo
. Let’s calculate the
SHA512 digest of its eContent
:
fd = open("data.p7m", "rb")
raw = file_mmaped(fd)
ctx = {"bered": True}
for decode_path, obj, _ in ContentInfo().decode_evgen(raw, ctx=ctx):
if decode_path == ("content",):
content = obj
break
else:
raise ValueError("no content found")
hasher_state = sha512()
def hasher(data):
hasher_state.update(data)
return len(data)
evgens = SignedData().decode_evgen(
raw[content.offset:],
offset=content.offset,
ctx=ctx,
)
agg_octet_string(evgens, ("encapContentInfo", "eContent"), raw, hasher)
fd.close()
digest = hasher_state.digest()
Simply replace hasher
with some writeable file’s fd.write
to
copy the payload (without BER/CER encoding interleaved overhead) in it.
Virtually it won’t take memory more than for keeping small structures
and 1 KB binary chunks.
SEQUENCE OF iterators¶
You can use iterators as a value in pyderasn.SequenceOf
classes. The only difference with providing the full list of objects, is
that type and bounds checking is done during encoding process. Also
sequence’s value will be emptied after encoding, forcing you to set its
value again.
This is very useful when you have to create some huge objects, like
CRLs, with thousands and millions of entities inside. You can write the
generator taking necessary data from the database and giving the
RevokedCertificate
objects. Only binary representation of that
objects will take memory during DER encoding.
2-pass DER encoding¶
There is ability to do 2-pass encoding to DER, writing results directly
to specified writer (buffer, file, whatever). It could be 1.5+ times
slower than ordinary encoding, but it takes little memory for 1st pass
state storing. For example, 1st pass state for CACert.org’s CRL with
~416K of certificate entries takes nearly 3.5 MB of memory.
SignedData
with several gigabyte EncapsulatedContentInfo
takes
nearly 0.5 KB of memory.
If you use mmap-ed memoryviews, SEQUENCE OF iterators and write directly to opened file, then there is very small memory footprint.
1st pass traverses through all the objects of the structure and returns the size of DER encoded structure, together with 1st pass state object. That state contains precalculated lengths for various objects inside the structure.
fulllen, state = obj.encode1st()
2nd pass takes the writer and 1st pass state. It traverses through all the objects again, but writes their encoded representation to the writer.
with open("result", "wb") as fd:
obj.encode2nd(fd.write, iter(state))
Warning
You MUST NOT use 1st pass state if anything is changed in the objects. It is intended to be used immediately after 1st pass is done!
If you use SEQUENCE OF iterators, then you have to reinitialize the values after the 1st pass. And you have to be sure that the iterator gives exactly the same values as previously. Yes, you have to run your iterator twice – because this is two pass encoding mode.
If you want to encode to the memory, then you can use convenient
pyderasn.encode2pass()
helper.
ASN.1 browser¶
- pyderasn.browse(raw, obj, oid_maps=())¶
Interactive browser
- Parameters:
raw (bytes) – binary data you decoded
obj – decoded
pyderasn.Obj
oid_maps – list of
str(OID) <-> human readable string
dictionaries. Its human readable form is printed when OID is met
Note
urwid dependency required
This browser is an interactive terminal application for browsing structures of your decoded ASN.1 objects. You can quit it with q key. It consists of three windows:
- Tree:
View of ASN.1 elements hierarchy. You can navigate it using Up, Down, PageUp, PageDown, Home, End keys. Left key goes to constructed element above. Plus/Minus keys collapse/uncollapse constructed elements. Space toggles it
- Info:
window with various information about element. You can scroll it with h/l (down, up) (H/L for triple speed) keys
- Hexdump:
window with raw data hexdump and highlighted current element’s contents. It automatically focuses on element’s data. You can scroll it with j/k (down, up) (J/K for triple speed) keys. If element has explicit tag, then it also will be highlighted with different colour
Window’s header contains current decode path and progress bars with position in info and hexdump windows.
If you press d, then current element will be saved in the current directory under its decode path name (adding “.0”, “.1”, etc suffix if such file already exists). D will save it with explicit tag.
You can also invoke it with
--browse
command line argument.
Base Obj¶
- class pyderasn.Obj(impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0))¶
Common ASN.1 object class
All ASN.1 types are inherited from it. It has metaclass that automatically adds
__slots__
to all inherited classes.- property bered¶
Is either object or any elements inside is BER encoded?
- decod(data, offset=0, decode_path=(), ctx=None)¶
Decode the data, check that tail is empty
- Raises:
ExceedingData – if tail is not empty
This is just a wrapper over
pyderasn.Obj.decode()
(decode without tail) that also checks that there is no trailing data left.
- decode(data, offset=0, leavemm=False, decode_path=(), ctx=None, tag_only=False, _ctx_immutable=True)¶
Decode the data
- Parameters:
data – either binary or memoryview
offset (int) – initial data’s offset
leavemm (bool) – do we need to leave memoryview of remaining data as is, or convert it to bytes otherwise
decode_path – current decode path (tuples of strings, possibly with DecodePathDefBy) with will be the root for all underlying objects
ctx – optional context governing decoding process
tag_only (bool) – decode only the tag, without length and contents (used only in Choice and Set structures, trying to determine if tag satisfies the schema)
_ctx_immutable (bool) – do we need to
copy.copy()
ctx
before using it?
- Returns:
(Obj, remaining data)
See also
- decode_evgen(data, offset=0, leavemm=False, decode_path=(), ctx=None, tag_only=False, _ctx_immutable=True, _evgen_mode=True)¶
Decode with evgen mode on
That method is identical to
pyderasn.Obj.decode()
, but it returns the generator producing(decode_path, obj, tail)
values. .. seealso:: evgen mode.
- property decoded¶
Is object decoded?
- encode()¶
DER encode the structure
- Returns:
DER representation
- encode1st(state=None)¶
Do the 1st pass of 2-pass encoding
- Return type:
(int, array(“L”))
- Returns:
full length of encoded data and precalculated various objects lengths
- encode2nd(writer, state_iter)¶
Do the 2nd pass of 2-pass encoding
- Parameters:
writer – must comply with
io.RawIOBase.write
behaviourstate_iter – iterator over the 1st pass state (
iter(state)
)
- encode_cer(writer)¶
CER encode the structure to specified writer
- Parameters:
writer – must comply with
io.RawIOBase.write
behaviour. It takes slice to be written and returns number of bytes processed. If it returns None, then exception will be raised
- hexdecod(data, *args, **kwargs)¶
Do
pyderasn.Obj.decod()
with hexadecimal decoded data
- hexdecode(data, *args, **kwargs)¶
Do
pyderasn.Obj.decode()
with hexadecimal decoded data
- hexencode()¶
Do hexadecimal encoded
pyderasn.Obj.encode()
- property ready¶
Is object ready to be encoded?
- property tag_order¶
Tag’s (class, number) used for DER/CER sorting
Primitive types¶
Boolean¶
- class pyderasn.Boolean(value=None, impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0))¶
BOOLEAN
boolean type>>> b = Boolean(True) BOOLEAN True >>> b == Boolean(True) True >>> bool(b) True
- __init__(value=None, impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0))¶
- Parameters:
value – set the value. Either boolean type, or
pyderasn.Boolean
objectimpl (bytes) – override default tag with
IMPLICIT
oneexpl (bytes) – override default tag with
EXPLICIT
onedefault – set default value. Type same as in
value
optional (bool) – is object
OPTIONAL
in sequence
Integer¶
- class pyderasn.Integer(value=None, bounds=None, impl=None, expl=None, default=None, optional=False, _specs=None, _decoded=(0, 0, 0))¶
INTEGER
integer type>>> b = Integer(-123) INTEGER -123 >>> b == Integer(-123) True >>> int(b) -123
>>> Integer(2, bounds=(1, 3)) INTEGER 2 >>> Integer(5, bounds=(1, 3)) Traceback (most recent call last): pyderasn.BoundsError: unsatisfied bounds: 1 <= 5 <= 3
class Version(Integer): schema = ( ("v1", 0), ("v2", 1), ("v3", 2), )
>>> v = Version("v1") Version INTEGER v1 >>> int(v) 0 >>> v.named 'v1' >>> v.specs {'v3': 2, 'v1': 0, 'v2': 1}
- __init__(value=None, bounds=None, impl=None, expl=None, default=None, optional=False, _specs=None, _decoded=(0, 0, 0))¶
- Parameters:
value – set the value. Either integer type, named value (if
schema
is specified in the class), orpyderasn.Integer
objectbounds – set
(MIN, MAX)
value constraint. (-inf, +inf) by defaultimpl (bytes) – override default tag with
IMPLICIT
oneexpl (bytes) – override default tag with
EXPLICIT
onedefault – set default value. Type same as in
value
optional (bool) – is object
OPTIONAL
in sequence
- property named¶
Return named representation (if exists) of the value
- tohex()¶
Hexadecimal representation
Use
pyderasn.colonize_hex()
for colonizing it.
BitString¶
- class pyderasn.BitString(value=None, impl=None, expl=None, default=None, optional=False, _specs=None, _decoded=(0, 0, 0))¶
BIT STRING
bit string type>>> BitString(b"hello world") BIT STRING 88 bits 68656c6c6f20776f726c64 >>> bytes(b) b'hello world' >>> b == b"hello world" True >>> b.bit_len 88
>>> BitString("'0A3B5F291CD'H") BIT STRING 44 bits 0a3b5f291cd0 >>> b = BitString("'010110000000'B") BIT STRING 12 bits 5800 >>> b.bit_len 12 >>> b[0], b[1], b[2], b[3] (False, True, False, True) >>> b[1000] False >>> [v for v in b] [False, True, False, True, True, False, False, False, False, False, False, False]
class KeyUsage(BitString): schema = ( ("digitalSignature", 0), ("nonRepudiation", 1), ("keyEncipherment", 2), )
>>> b = KeyUsage(("keyEncipherment", "nonRepudiation")) KeyUsage BIT STRING 3 bits nonRepudiation, keyEncipherment >>> b.named ['nonRepudiation', 'keyEncipherment'] >>> b.specs {'nonRepudiation': 1, 'digitalSignature': 0, 'keyEncipherment': 2}
Note
Pay attention that BIT STRING can be encoded both in primitive and constructed forms. Decoder always checks constructed form tag additionally to specified primitive one. If BER decoding is not enabled, then decoder will fail, because of DER restrictions.
- __init__(value=None, impl=None, expl=None, default=None, optional=False, _specs=None, _decoded=(0, 0, 0))¶
- Parameters:
value – set the value. Either binary type, tuple of named values (if
schema
is specified in the class), string in'XXX...'B
form, orpyderasn.BitString
objectimpl (bytes) – override default tag with
IMPLICIT
oneexpl (bytes) – override default tag with
EXPLICIT
onedefault – set default value. Type same as in
value
optional (bool) – is object
OPTIONAL
in sequence
- property bit_len¶
Returns number of bits in the string
- property named¶
Named representation (if exists) of the bits
- Returns:
[str(name), …]
OctetString¶
- class pyderasn.OctetString(value=None, bounds=None, impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0), ctx=None)¶
OCTET STRING
binary string type>>> s = OctetString(b"hello world") OCTET STRING 11 bytes 68656c6c6f20776f726c64 >>> s == OctetString(b"hello world") True >>> bytes(s) b'hello world'
>>> OctetString(b"hello", bounds=(4, 4)) Traceback (most recent call last): pyderasn.BoundsError: unsatisfied bounds: 4 <= 5 <= 4 >>> OctetString(b"hell", bounds=(4, 4)) OCTET STRING 4 bytes 68656c6c
Memoryviews can be used as a values. If memoryview is made on mmap-ed file, then it does not take storage inside OctetString itself. In CER encoding mode it will be streamed to the specified writer, copying 1 KB chunks.
- __init__(value=None, bounds=None, impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0), ctx=None)¶
- Parameters:
value – set the value. Either binary type, or
pyderasn.OctetString
objectbounds – set
(MIN, MAX)
value size constraint. (-inf, +inf) by defaultimpl (bytes) – override default tag with
IMPLICIT
oneexpl (bytes) – override default tag with
EXPLICIT
onedefault – set default value. Type same as in
value
optional (bool) – is object
OPTIONAL
in sequence
Null¶
- class pyderasn.Null(value=None, impl=None, expl=None, optional=False, _decoded=(0, 0, 0))¶
NULL
null object>>> n = Null() NULL >>> n.ready True
- __init__(value=None, impl=None, expl=None, optional=False, _decoded=(0, 0, 0))¶
- Parameters:
impl (bytes) – override default tag with
IMPLICIT
oneexpl (bytes) – override default tag with
EXPLICIT
oneoptional (bool) – is object
OPTIONAL
in sequence
ObjectIdentifier¶
- class pyderasn.ObjectIdentifier(value=None, defines=(), impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0))¶
OBJECT IDENTIFIER
OID type>>> oid = ObjectIdentifier((1, 2, 3)) OBJECT IDENTIFIER 1.2.3 >>> oid == ObjectIdentifier("1.2.3") True >>> tuple(oid) (1, 2, 3) >>> str(oid) '1.2.3' >>> oid + (4, 5) + ObjectIdentifier("1.7") OBJECT IDENTIFIER 1.2.3.4.5.1.7
>>> str(ObjectIdentifier((3, 1))) Traceback (most recent call last): pyderasn.InvalidOID: unacceptable first arc value
- __init__(value=None, defines=(), impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0))¶
- Parameters:
value – set the value. Either tuples of integers, string of “.”-concatenated integers, or
pyderasn.ObjectIdentifier
objectdefines –
sequence of tuples. Each tuple has two elements. First one is relative to current one decode path, aiming to the field defined by that OID. Read about relative path in
pyderasn.abs_decode_path()
. Second tuple element is{OID: pyderasn.Obj()}
dictionary, mapping between current OID value and structure applied to defined field.See also
impl (bytes) – override default tag with
IMPLICIT
oneexpl (bytes) – override default tag with
EXPLICIT
onedefault – set default value. Type same as in
value
optional (bool) – is object
OPTIONAL
in sequence
Enumerated¶
- class pyderasn.Enumerated(value=None, impl=None, expl=None, default=None, optional=False, _specs=None, _decoded=(0, 0, 0), bounds=None)¶
ENUMERATED
integer typeThis type is identical to
pyderasn.Integer
, but requires schema to be specified and does not accept values missing from it.
CommonString¶
- class pyderasn.CommonString(value=None, bounds=None, impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0), ctx=None)¶
Common class for all strings
Everything resembles
pyderasn.OctetString
, except ability to deal with unicode text strings.>>> hexenc("привет мир".encode("utf-8")) 'd0bfd180d0b8d0b2d0b5d18220d0bcd0b8d180' >>> UTF8String("привет мир") == UTF8String(hexdec("d0...80")) True >>> s = UTF8String("привет мир") UTF8String UTF8String привет мир >>> str(s) 'привет мир' >>> hexenc(bytes(s)) 'd0bfd180d0b8d0b2d0b5d18220d0bcd0b8d180'
>>> PrintableString("привет мир") Traceback (most recent call last): pyderasn.DecodeError: 'ascii' codec can't encode characters in position 0-5: ordinal not in range(128)
>>> BMPString("ада", bounds=(2, 2)) Traceback (most recent call last): pyderasn.BoundsError: unsatisfied bounds: 2 <= 3 <= 2 >>> s = BMPString("ад", bounds=(2, 2)) >>> s.encoding 'utf-16-be' >>> hexenc(bytes(s)) '04300434'
Class
Text Encoding, validation
pyderasn.UTF8String
utf-8
proper alphabet validation
proper alphabet validation
pyderasn.TeletexString
iso-8859-1
pyderasn.T61String
iso-8859-1
pyderasn.VideotexString
iso-8859-1
proper alphabet validation
pyderasn.GraphicString
iso-8859-1
pyderasn.VisibleString
,pyderasn.ISO646String
proper alphabet validation
pyderasn.GeneralString
iso-8859-1
pyderasn.UniversalString
utf-32-be
pyderasn.BMPString
utf-16-be
NumericString¶
- class pyderasn.NumericString(*args, **kwargs)¶
Numeric string
Its value is properly sanitized: only ASCII digits with spaces can be stored.
>>> NumericString().allowable_chars frozenset(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ' '])
PrintableString¶
- class pyderasn.PrintableString(value=None, bounds=None, impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0), ctx=None, allow_asterisk=False, allow_ampersand=False)¶
Printable string
Its value is properly sanitized: see X.680 41.4 table 10.
>>> PrintableString().allowable_chars frozenset([' ', "'", ..., 'z']) >>> obj = PrintableString("foo*bar", allow_asterisk=True) PrintableString PrintableString foo*bar >>> obj.allow_asterisk, obj.allow_ampersand (True, False)
- __init__(value=None, bounds=None, impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0), ctx=None, allow_asterisk=False, allow_ampersand=False)¶
- Parameters:
allow_asterisk – allow asterisk character
allow_ampersand – allow ampersand character
- property allow_ampersand¶
Is ampersand character allowed?
- property allow_asterisk¶
Is asterisk character allowed?
IA5String¶
- class pyderasn.IA5String(*args, **kwargs)¶
IA5 string
Its value is properly sanitized: it is a mix of
DEL character (0x7F)
It is just 7-bit ASCII.
>>> IA5String().allowable_chars frozenset(["NUL", ... "DEL"])
VisibleString¶
- class pyderasn.VisibleString(*args, **kwargs)¶
Visible string
Its value is properly sanitized. ASCII subset from space to tilde is allowed: http://www.itscj.ipsj.or.jp/iso-ir/006.pdf
>>> VisibleString().allowable_chars frozenset([" ", ... "~"])
UTCTime¶
- class pyderasn.UTCTime(value=None, impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0), bounds=None, ctx=None)¶
UTCTime
datetime type>>> t = UTCTime(datetime(2017, 9, 30, 22, 7, 50, 123)) UTCTime UTCTime 2017-09-30T22:07:50 >>> str(t) '170930220750Z' >>> bytes(t) b'170930220750Z' >>> t.todatetime() datetime.datetime(2017, 9, 30, 22, 7, 50) >>> UTCTime(datetime(2057, 9, 30, 22, 7, 50)).todatetime() datetime.datetime(1957, 9, 30, 22, 7, 50) >>> UTCTime(datetime(2057, 9, 30, 22, 7, 50)).totzdatetime() datetime.datetime(1957, 9, 30, 22, 7, 50, tzinfo=tzutc())
If BER encoded value was met, then
ber_raw
attribute will hold its raw representation.Warning
Only naive
datetime
objects are supported. Library assumes that all work is done in UTC.Warning
Pay attention that
UTCTime
can not hold full year, so all years having < 50 years are treated as 20xx, 19xx otherwise, according to X.509 recommendation. UseGeneralizedTime
instead for removing ambiguity.Warning
No strict validation of UTC offsets are made (only applicable to BER), but very crude:
minutes are not exceeding 60
offset value is not exceeding 14 hours
- __init__(value=None, impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0), bounds=None, ctx=None)¶
- Parameters:
value – set the value. Either datetime type, or
pyderasn.UTCTime
objectimpl (bytes) – override default tag with
IMPLICIT
oneexpl (bytes) – override default tag with
EXPLICIT
onedefault – set default value. Type same as in
value
optional (bool) – is object
OPTIONAL
in sequence
GeneralizedTime¶
- class pyderasn.GeneralizedTime(value=None, impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0), bounds=None, ctx=None)¶
GeneralizedTime
datetime typeThis type is similar to
pyderasn.UTCTime
.>>> t = GeneralizedTime(datetime(2017, 9, 30, 22, 7, 50, 123)) GeneralizedTime GeneralizedTime 2017-09-30T22:07:50.000123 >>> str(t) '20170930220750.000123Z' >>> t = GeneralizedTime(datetime(2057, 9, 30, 22, 7, 50)) GeneralizedTime GeneralizedTime 2057-09-30T22:07:50
Warning
Only naive datetime objects are supported. Library assumes that all work is done in UTC.
Warning
Only microsecond fractions are supported in DER encoding.
pyderasn.DecodeError
will be raised during decoding of higher precision values.Warning
BER encoded data can loss information (accuracy) during decoding because of float transformations.
Warning
Zero year is unsupported.
- __init__(value=None, impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0), bounds=None, ctx=None)¶
- Parameters:
value – set the value. Either datetime type, or
pyderasn.UTCTime
objectimpl (bytes) – override default tag with
IMPLICIT
oneexpl (bytes) – override default tag with
EXPLICIT
onedefault – set default value. Type same as in
value
optional (bool) – is object
OPTIONAL
in sequence
Special types¶
Choice¶
- class pyderasn.Choice(value=None, schema=None, impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0))¶
CHOICE
special typeclass GeneralName(Choice): schema = ( ("rfc822Name", IA5String(impl=tag_ctxp(1))), ("dNSName", IA5String(impl=tag_ctxp(2))), )
>>> gn = GeneralName() GeneralName CHOICE >>> gn["rfc822Name"] = IA5String("foo@bar.baz") GeneralName CHOICE rfc822Name[[1] IA5String IA5 foo@bar.baz] >>> gn["dNSName"] = IA5String("bar.baz") GeneralName CHOICE dNSName[[2] IA5String IA5 bar.baz] >>> gn["rfc822Name"] None >>> gn["dNSName"] [2] IA5String IA5 bar.baz >>> gn.choice 'dNSName' >>> gn.value == gn["dNSName"] True >>> gn.specs OrderedDict([('rfc822Name', [1] IA5String IA5), ('dNSName', [2] IA5String IA5)])
>>> GeneralName(("rfc822Name", IA5String("foo@bar.baz"))) GeneralName CHOICE rfc822Name[[1] IA5String IA5 foo@bar.baz]
- __init__(value=None, schema=None, impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0))¶
- Parameters:
value – set the value. Either
(choice, value)
tuple, orpyderasn.Choice
objectimpl (bytes) – can not be set, do not use it
expl (bytes) – override default tag with
EXPLICIT
onedefault – set default value. Type same as in
value
optional (bool) – is object
OPTIONAL
in sequence
- property choice¶
Name of the choice
- property value¶
Value of underlying choice
PrimitiveTypes¶
- class pyderasn.PrimitiveTypes(value=None, schema=None, impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0))¶
Predefined
CHOICE
for all generic primitive typesIt could be useful for general decoding of some unspecified values:
>>> PrimitiveTypes().decod(hexdec("0403666f6f")).value OCTET STRING 3 bytes 666f6f >>> PrimitiveTypes().decod(hexdec("0203123456")).value INTEGER 1193046
Any¶
- class pyderasn.Any(value=None, expl=None, optional=False, _decoded=(0, 0, 0))¶
ANY
special type>>> Any(Integer(-123)) ANY INTEGER -123 (0X:7B) >>> a = Any(OctetString(b"hello world").encode()) ANY 040b68656c6c6f20776f726c64 >>> hexenc(bytes(a)) b'0x040x0bhello world'
- __init__(value=None, expl=None, optional=False, _decoded=(0, 0, 0))¶
- Parameters:
value – set the value. Either any kind of pyderasn’s ready object, or bytes. Pay attention that no validation is performed if raw binary value is valid TLV, except just tag decoding
expl (bytes) – override default tag with
EXPLICIT
oneoptional (bool) – is object
OPTIONAL
in sequence
Constructed types¶
Sequence¶
- class pyderasn.Sequence(value=None, schema=None, impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0))¶
SEQUENCE
structure typeYou have to make specification of sequence:
class Extension(Sequence): schema = ( ("extnID", ObjectIdentifier()), ("critical", Boolean(default=False)), ("extnValue", OctetString()), )
Then, you can work with it as with dictionary.
>>> ext = Extension() >>> Extension().specs OrderedDict([ ('extnID', OBJECT IDENTIFIER), ('critical', BOOLEAN False OPTIONAL DEFAULT), ('extnValue', OCTET STRING), ]) >>> ext["extnID"] = "1.2.3" Traceback (most recent call last): pyderasn.InvalidValueType: invalid value type, expected: <class 'pyderasn.ObjectIdentifier'> >>> ext["extnID"] = ObjectIdentifier("1.2.3")
You can determine if sequence is ready to be encoded:
>>> ext.ready False >>> ext.encode() Traceback (most recent call last): pyderasn.ObjNotReady: object is not ready: extnValue >>> ext["extnValue"] = OctetString(b"foobar") >>> ext.ready True
Value you want to assign, must have the same type as in corresponding specification, but it can have different tags, optional/default attributes – they will be taken from specification automatically:
class TBSCertificate(Sequence): schema = ( ("version", Version(expl=tag_ctxc(0), default="v1")), [...]
>>> tbs = TBSCertificate() >>> tbs["version"] = Version("v2") # no need to explicitly add ``expl``
Assign
None
to remove value from sequence.You can set values in Sequence during its initialization:
>>> AlgorithmIdentifier(( ("algorithm", ObjectIdentifier("1.2.3")), ("parameters", Any(Null())) )) AlgorithmIdentifier SEQUENCE[algorithm: OBJECT IDENTIFIER 1.2.3; parameters: ANY 0500 OPTIONAL]
You can determine if value exists/set in the sequence and take its value:
>>> "extnID" in ext, "extnValue" in ext, "critical" in ext (True, True, False) >>> ext["extnID"] OBJECT IDENTIFIER 1.2.3
But pay attention that if value has default, then it won’t be (not in) in the sequence (because
DEFAULT
must not be encoded in DER), but you can read its value:>>> "critical" in ext, ext["critical"] (False, BOOLEAN False) >>> ext["critical"] = Boolean(True) >>> "critical" in ext, ext["critical"] (True, BOOLEAN True)
All defaulted values are always optional.
DER prohibits default value encoding and will raise an error if default value is unexpectedly met during decode. If bered context option is set, then no error will be raised, but
bered
attribute set. You can disable strict defaulted values existence validation by setting"allow_default_values": True
context option.All values with DEFAULT specified are decoded atomically in evgen mode. If DEFAULT value is some kind of SEQUENCE, then it will be yielded as a single element, not disassembled. That is required for DEFAULT existence check.
Two sequences are equal if they have equal specification (schema), implicit/explicit tagging and the same values.
- __init__(value=None, schema=None, impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0))¶
Set¶
- class pyderasn.Set(value=None, schema=None, impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0))¶
SET
structure typeIts usage is identical to
pyderasn.Sequence
.DER prohibits unordered values encoding and will raise an error during decode. If bered context option is set, then no error will occur. Also you can disable strict values ordering check by setting
"allow_unordered_set": True
context option.- __init__(value=None, schema=None, impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0))¶
SequenceOf¶
- class pyderasn.SequenceOf(value=None, schema=None, bounds=None, impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0))¶
SEQUENCE OF
sequence typeFor that kind of type you must specify the object it will carry on (bounds are for example here, not required):
class Ints(SequenceOf): schema = Integer() bounds = (0, 2)
>>> ints = Ints() >>> ints.append(Integer(123)) >>> ints.append(Integer(234)) >>> ints Ints SEQUENCE OF[INTEGER 123, INTEGER 234] >>> [int(i) for i in ints] [123, 234] >>> ints.append(Integer(345)) Traceback (most recent call last): pyderasn.BoundsError: unsatisfied bounds: 0 <= 3 <= 2 >>> ints[1] INTEGER 234 >>> ints[1] = Integer(345) >>> ints Ints SEQUENCE OF[INTEGER 123, INTEGER 345]
You can initialize sequence with preinitialized values:
>>> ints = Ints([Integer(123), Integer(234)])
Also you can use iterator as a value:
>>> ints = Ints(iter(Integer(i) for i in range(1000000)))
And it won’t be iterated until encoding process. Pay attention that bounds and required schema checks are done only during the encoding process in that case! After encode was called, then value is zeroed back to empty list and you have to set it again. That mode is useful mainly with CER encoding mode, where all objects from the iterable will be streamed to the buffer, without copying all of them to memory first.
- __init__(value=None, schema=None, bounds=None, impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0))¶
SetOf¶
- class pyderasn.SetOf(value=None, schema=None, bounds=None, impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0))¶
SET OF
sequence typeIts usage is identical to
pyderasn.SequenceOf
.- __init__(value=None, schema=None, bounds=None, impl=None, expl=None, default=None, optional=False, _decoded=(0, 0, 0))¶
Various¶
- pyderasn.abs_decode_path(decode_path, rel_path)¶
Create an absolute decode path from current and relative ones
- Parameters:
decode_path – current decode path, starting point. Tuple of strings
rel_path – relative path to
decode_path
. Tuple of strings. If first tuple’s element is “/”, then treat it as an absolute path, ignoringdecode_path
as starting point. Also this tuple can contain “..” elements, stripping the leading element fromdecode_path
>>> abs_decode_path(("foo", "bar"), ("baz", "whatever")) ("foo", "bar", "baz", "whatever") >>> abs_decode_path(("foo", "bar", "baz"), ("..", "..", "whatever")) ("foo", "whatever") >>> abs_decode_path(("foo", "bar"), ("/", "baz", "whatever")) ("baz", "whatever")
- pyderasn.agg_octet_string(evgens, decode_path, raw, writer)¶
Aggregate constructed string (OctetString and its derivatives)
- Parameters:
evgens – iterator of generated events
decode_path – points to the string we want to decode
raw – slicebable (memoryview, bytearray, etc) with the data evgens are generated on
writer – buffer.write where string is going to be saved
writer – where string is going to be saved. Must comply with
io.RawIOBase.write
behaviour
See also
- pyderasn.ascii_visualize(ba)¶
Output only ASCII printable characters, like in hexdump -C
Example output for given binary string (right part):
92 2b 39 20 65 91 e6 8e 95 93 1a 58 df 02 78 ea |.+9 e......X..x.| ^^^^^^^^^^^^^^^^
- pyderasn.colonize_hex(hexed)¶
Separate hexadecimal string with colons
- pyderasn.encode2pass(obj)¶
Encode (2-pass mode) to DER in memory buffer
- Returns bytes:
memory buffer contents
- pyderasn.encode_cer(obj)¶
Encode to CER in memory buffer
- Returns bytes:
memory buffer contents
- pyderasn.file_mmaped(fd)¶
Make mmap-ed memoryview for reading from file
- Parameters:
fd – file object
- Returns:
memoryview over read-only mmap-ing of the whole file
Warning
It does not work under Windows.
- pyderasn.hexenc(data)¶
Hexadecimal string to binary data convert
- pyderasn.hexdec(data)¶
Binary data to hexadecimal string convert
- pyderasn.hexdump(raw)¶
Generate
hexdump -C
like outputRendered example:
00000000 30 80 30 80 a0 80 02 01 02 00 00 02 14 54 a5 18 |0.0..........T..| 00000010 69 ef 8b 3f 15 fd ea ad bd 47 e0 94 81 6b 06 6a |i..?.....G...k.j|
Result of that function is a generator of lines, where each line is a list of columns:
[ [...], ["00000010 ", " 69", " ef", " 8b", " 3f", " 15", " fd", " ea", " ad ", " bd", " 47", " e0", " 94", " 81", " 6b", " 06", " 6a ", " |i..?.....G...k.j|"] [...], ]
- pyderasn.tag_encode(num, klass=0, form=0)¶
Encode tag to binary form
- Parameters:
num (int) – tag’s number
klass (int) – tag’s class (
pyderasn.TagClassUniversal
,pyderasn.TagClassContext
,pyderasn.TagClassApplication
,pyderasn.TagClassPrivate
)form (int) – tag’s form (
pyderasn.TagFormPrimitive
,pyderasn.TagFormConstructed
)
- pyderasn.tag_decode(tag)¶
Decode tag from binary form
Warning
No validation is performed, assuming that it has already passed.
It returns tuple with three integers, as
pyderasn.tag_encode()
accepts.
- pyderasn.tag_ctxp(num)¶
Create CONTEXT PRIMITIVE tag
- pyderasn.tag_ctxc(num)¶
Create CONTEXT CONSTRUCTED tag
- class pyderasn.DecodeError(msg='', klass=None, decode_path=(), offset=0)¶
- __init__(msg='', klass=None, decode_path=(), offset=0)¶
- Parameters:
msg (str) – reason of decode failing
klass – optional exact DecodeError inherited class (like
NotEnoughData
,TagMismatch
,InvalidLength
)decode_path – tuple of strings. It contains human readable names of the fields through which decoding process has passed
offset (int) – binary offset where failure happened
- class pyderasn.NotEnoughData(msg='', klass=None, decode_path=(), offset=0)¶
- class pyderasn.ExceedingData(nbytes)¶
- class pyderasn.LenIndefForm(msg='', klass=None, decode_path=(), offset=0)¶
- class pyderasn.TagMismatch(msg='', klass=None, decode_path=(), offset=0)¶
- class pyderasn.InvalidLength(msg='', klass=None, decode_path=(), offset=0)¶
- class pyderasn.InvalidOID(msg='', klass=None, decode_path=(), offset=0)¶
- class pyderasn.ObjUnknown(name)¶
- class pyderasn.ObjNotReady(name)¶
- class pyderasn.InvalidValueType(expected_types)¶
- class pyderasn.BoundsError(bound_min, value, bound_max)¶
Command-line usage¶
You can decode DER/BER files using command line abilities:
$ python -m pyderasn --schema tests.test_crts:Certificate path/to/file
If there is no schema for your file, then you can try parsing it without, but of course IMPLICIT tags will often make it impossible. But result is good enough for the certificate above:
$ python -m pyderasn path/to/file
0 [1,3,1604] . >: SEQUENCE OF
4 [1,3,1453] . . >: SEQUENCE OF
8 [0,0, 5] . . . . >: [0] ANY
. . . . . A0:03:02:01:02
13 [1,1, 3] . . . . >: INTEGER 61595
18 [1,1, 13] . . . . >: SEQUENCE OF
20 [1,1, 9] . . . . . . >: OBJECT IDENTIFIER 1.2.840.113549.1.1.5
31 [1,1, 0] . . . . . . >: NULL
33 [1,3, 274] . . . . >: SEQUENCE OF
37 [1,1, 11] . . . . . . >: SET OF
39 [1,1, 9] . . . . . . . . >: SEQUENCE OF
41 [1,1, 3] . . . . . . . . . . >: OBJECT IDENTIFIER 2.5.4.6
46 [1,1, 2] . . . . . . . . . . >: PrintableString PrintableString ES
[...]
1409 [1,1, 50] . . . . . . >: SEQUENCE OF
1411 [1,1, 8] . . . . . . . . >: OBJECT IDENTIFIER 1.3.6.1.5.5.7.1.1
1421 [1,1, 38] . . . . . . . . >: OCTET STRING 38 bytes
. . . . . . . . . 30:24:30:22:06:08:2B:06:01:05:05:07:30:01:86:16
. . . . . . . . . 68:74:74:70:3A:2F:2F:6F:63:73:70:2E:69:70:73:63
. . . . . . . . . 61:2E:63:6F:6D:2F
1461 [1,1, 13] . . >: SEQUENCE OF
1463 [1,1, 9] . . . . >: OBJECT IDENTIFIER 1.2.840.113549.1.1.5
1474 [1,1, 0] . . . . >: NULL
1476 [1,2, 129] . . >: BIT STRING 1024 bits
. . . 68:EE:79:97:97:DD:3B:EF:16:6A:06:F2:14:9A:6E:CD
. . . 9E:12:F7:AA:83:10:BD:D1:7C:98:FA:C7:AE:D4:0E:2C
[...]
Human readable OIDs¶
If you have got dictionaries with ObjectIdentifiers, like example one
from tests/test_crts.py
:
stroid2name = {
"1.2.840.113549.1.1.1": "id-rsaEncryption",
"1.2.840.113549.1.1.5": "id-sha1WithRSAEncryption",
[...]
"2.5.4.10": "id-at-organizationName",
"2.5.4.11": "id-at-organizationalUnitName",
}
then you can pass it to pretty printer to see human readable OIDs:
$ python -m pyderasn --oids tests.test_crts:stroid2name path/to/file
[...]
37 [1,1, 11] . . . . . . >: SET OF
39 [1,1, 9] . . . . . . . . >: SEQUENCE OF
41 [1,1, 3] . . . . . . . . . . >: OBJECT IDENTIFIER id-at-countryName (2.5.4.6)
46 [1,1, 2] . . . . . . . . . . >: PrintableString PrintableString ES
50 [1,1, 18] . . . . . . >: SET OF
52 [1,1, 16] . . . . . . . . >: SEQUENCE OF
54 [1,1, 3] . . . . . . . . . . >: OBJECT IDENTIFIER id-at-stateOrProvinceName (2.5.4.8)
59 [1,1, 9] . . . . . . . . . . >: PrintableString PrintableString Barcelona
70 [1,1, 18] . . . . . . >: SET OF
72 [1,1, 16] . . . . . . . . >: SEQUENCE OF
74 [1,1, 3] . . . . . . . . . . >: OBJECT IDENTIFIER id-at-localityName (2.5.4.7)
79 [1,1, 9] . . . . . . . . . . >: PrintableString PrintableString Barcelona
[...]
Decode paths¶
Each decoded element has so-called decode path: sequence of structure
names it is passing during the decode process. Each element has its own
unique path inside the whole ASN.1 tree. You can print it out with
--print-decode-path
option:
$ python -m pyderasn --schema path.to:Certificate --print-decode-path path/to/file
0 [1,3,1604] Certificate SEQUENCE []
4 [1,3,1453] . tbsCertificate: TBSCertificate SEQUENCE [tbsCertificate]
10-2 [1,1, 1] . . version: [0] EXPLICIT Version INTEGER v3 OPTIONAL [tbsCertificate:version]
13 [1,1, 3] . . serialNumber: CertificateSerialNumber INTEGER 61595 [tbsCertificate:serialNumber]
18 [1,1, 13] . . signature: AlgorithmIdentifier SEQUENCE [tbsCertificate:signature]
20 [1,1, 9] . . . algorithm: OBJECT IDENTIFIER 1.2.840.113549.1.1.5 [tbsCertificate:signature:algorithm]
31 [0,0, 2] . . . parameters: [UNIV 5] ANY OPTIONAL [tbsCertificate:signature:parameters]
. . . . 05:00
33 [0,0, 278] . . issuer: Name CHOICE rdnSequence [tbsCertificate:issuer]
33 [1,3, 274] . . . rdnSequence: RDNSequence SEQUENCE OF [tbsCertificate:issuer:rdnSequence]
37 [1,1, 11] . . . . 0: RelativeDistinguishedName SET OF [tbsCertificate:issuer:rdnSequence:0]
39 [1,1, 9] . . . . . 0: AttributeTypeAndValue SEQUENCE [tbsCertificate:issuer:rdnSequence:0:0]
41 [1,1, 3] . . . . . . type: AttributeType OBJECT IDENTIFIER 2.5.4.6 [tbsCertificate:issuer:rdnSequence:0:0:type]
46 [0,0, 4] . . . . . . value: [UNIV 19] AttributeValue ANY [tbsCertificate:issuer:rdnSequence:0:0:value]
. . . . . . . 13:02:45:53
46 [1,1, 2] . . . . . . . DEFINED BY 2.5.4.6: CountryName PrintableString ES [tbsCertificate:issuer:rdnSequence:0:0:value:DEFINED BY 2.5.4.6]
[...]
Now you can print only the specified tree, for example signature algorithm:
$ python -m pyderasn --schema path.to:Certificate --decode-path-only tbsCertificate:signature path/to/file
18 [1,1, 13] AlgorithmIdentifier SEQUENCE
20 [1,1, 9] . algorithm: OBJECT IDENTIFIER 1.2.840.113549.1.1.5
31 [0,0, 2] . parameters: [UNIV 5] ANY OPTIONAL
. . 05:00