ROOTPLOIT
Server: LiteSpeed
System: Linux server71.web-hosting.com 4.18.0-513.18.1.lve.el8.x86_64 #1 SMP Thu Feb 22 12:55:50 UTC 2024 x86_64
User: niphet (1079)
PHP: 5.3.29
Disabled: NONE
Upload Files
File: //lib64/python3.6/email/__pycache__/_header_value_parser.cpython-36.pyc
3

i@szdZddlZddlZddlmZddlmZddlmZddl	m
Zddl	mZddl	m
Z
ed	Zeed
BZedZeeBZeedZeed
ZeedBedZeeBZeedBZeeBZeedZddhZeeBZddZGdddeZGdddeZGdddeZ GdddeZ!GdddeZ"Gdd d eZ#Gd!d"d"eZ$Gd#d$d$eZ%Gd%d&d&eZ&Gd'd(d(eZ'Gd)d*d*e'Z(Gd+d,d,eZ)Gd-d.d.eZ*Gd/d0d0eZ+Gd1d2d2eZ,Gd3d4d4eZ-Gd5d6d6eZ.Gd7d8d8eZ/Gd9d:d:eZ0Gd;d<d<eZ1Gd=d>d>eZ2Gd?d@d@eZ3GdAdBdBeZ4GdCdDdDeZ5GdEdFdFeZ6GdGdHdHeZ7GdIdJdJeZ8GdKdLdLe!Z9GdMdNdNeZ:GdOdPdPeZ;GdQdRdReZ<GdSdTdTeZ=GdUdVdVe=Z>GdWdXdXeZ?GdYdZdZeZ@Gd[d\d\eZAGd]d^d^eZBGd_d`d`eZCGdadbdbeCZDGdcddddeCZEGdedfdfeZFGdgdhdheZGGdidjdjeZHGdkdldleIZJGdmdndneJZKGdodpdpeJZLGdqdrdreKZMeLddsZNeLdtduZOeLdvdwZPejQdxjRdyjSejTZUejQdzjRdyjSejVd{d|jVd}d~jWZXejQdjYZZejQdzjRdyjSejVd{d|jVd}d~jWZ[ejQdzjRdyjSejVd{d|jVd}d~jWZ\ejQdzjRdyjSejVd{d|jVd}d~jWZ]ddZ^ddZ_ddZ`ddZaddZbddZcddZdddZeddZfddZgddZhddZiddZjddZkddZlddZmddZnddZoddZpddZqddZrddZsddZtddZuddZvddZwddZxddZyddZzddZ{ddZ|ddZ}ddZ~ddÄZddńZddDŽZddɄZdd˄Zdd̈́ZddτZddфZddӄZddՄZddׄZddلZddۄZdd݄Zdd߄ZddZddZddZddZddZddZddZdS)alHeader value parser implementing various email-related RFC parsing rules.

The parsing methods defined in this module implement various email related
parsing rules.  Principal among them is RFC 5322, which is the followon
to RFC 2822 and primarily a clarification of the former.  It also implements
RFC 2047 encoded word decoding.

RFC 5322 goes to considerable trouble to maintain backward compatibility with
RFC 822 in the parse phase, while cleaning up the structure on the generation
phase.  This parser supports correct RFC 5322 generation by tagging white space
as folding white space only when folding is allowed in the non-obsolete rule
sets.  Actually, the parser is even more generous when accepting input than RFC
5322 mandates, following the spirit of Postel's Law, which RFC 5322 encourages.
Where possible deviations from the standard are annotated on the 'defects'
attribute of tokens that deviate.

The general structure of the parser follows RFC 5322, and uses its terminology
where there is a direct correspondence.  Where the implementation requires a
somewhat different structure than that used by the formal grammar, new terms
that mimic the closest existing terms are used.  Thus, it really helps to have
a copy of RFC 5322 handy when studying this code.

Input to the parser is a string that has already been unfolded according to
RFC 5322 rules.  According to the RFC this unfolding is the very first step, and
this parser leaves the unfolding step to a higher level message parser, which
will have already detected the line breaks that need unfolding while
determining the beginning and end of each header.

The output of the parser is a TokenList object, which is a list subclass.  A
TokenList is a recursive data structure.  The terminal nodes of the structure
are Terminal objects, which are subclasses of str.  These do not correspond
directly to terminal objects in the formal grammar, but are instead more
practical higher level combinations of true terminals.

All TokenList and Terminal objects have a 'value' attribute, which produces the
semantically meaningful value of that part of the parse subtree.  The value of
all whitespace tokens (no matter how many sub-tokens they may contain) is a
single space, as per the RFC rules.  This includes 'CFWS', which is herein
included in the general class of whitespace tokens.  There is one exception to
the rule that whitespace tokens are collapsed into single spaces in values: in
the value of a 'bare-quoted-string' (a quoted-string with no leading or
trailing whitespace), any whitespace that appeared between the quotation marks
is preserved in the returned value.  Note that in all Terminal strings quoted
pairs are turned into their unquoted values.

All TokenList and Terminal objects also have a string value, which attempts to
be a "canonical" representation of the RFC-compliant form of the substring that
produced the parsed subtree, including minimal use of quoted pair quoting.
Whitespace runs are not collapsed.

Comment tokens also have a 'content' attribute providing the string found
between the parens (including any nested comments) with whitespace preserved.

All TokenList and Terminal objects have a 'defects' attribute which is a
possibly empty list all of the defects found while creating the token.  Defects
may appear on any token in the tree, and a composite list of all defects in the
subtree is available through the 'all_defects' attribute of any node.  (For
Terminal notes x.defects == x.all_defects.)

Each object in a parse tree is called a 'token', and each has a 'token_type'
attribute that gives the name from the RFC 5322 grammar that it represents.
Not all RFC 5322 nodes are produced, and there is one non-RFC 5322 node that
may be produced: 'ptext'.  A 'ptext' is a string of printable ascii characters.
It is returned in place of lists of (ctext/quoted-pair) and
(qtext/quoted-pair).

XXX: provide complete list of token types.
N)	hexdigits)OrderedDict)
itemgetter)_encoded_words)errors)utilsz 	(z
()<>@,:;.\"[].z."(z/?=z*'%%

cCs dt|jddjdddS)N"\z\\z\")strreplace)valuer2/usr/lib64/python3.6/email/_header_value_parser.pyquote_stringbsrcseZdZdZdZdZfddZddZfddZe	d	d
Z
e	ddZd
dZe	ddZ
e	ddZddZdddZdddZdddZZS)	TokenListNTcstj||g|_dS)N)super__init__defects)selfargskw)	__class__rrroszTokenList.__init__cCsdjdd|DS)Ncss|]}t|VqdS)N)r).0xrrr	<genexpr>tsz$TokenList.__str__.<locals>.<genexpr>)join)rrrr__str__sszTokenList.__str__csdj|jjtjS)Nz{}({}))formatr__name__r__repr__)r)rrrr%vs
zTokenList.__repr__cCsdjdd|DS)Nrcss|]}|jr|jVqdS)N)r)rrrrrr |sz"TokenList.value.<locals>.<genexpr>)r!)rrrrrzszTokenList.valuecCstdd|D|jS)Ncss|]}|jVqdS)N)all_defects)rrrrrr sz(TokenList.all_defects.<locals>.<genexpr>)sumr)rrrrr&~szTokenList.all_defectscCs|djS)Nr)startswith_fws)rrrrr(szTokenList.startswith_fwscCstdd|DS)zATrue if all top level tokens of this part may be RFC2047 encoded.css|]}|jVqdS)N)
as_ew_allowed)rpartrrrr sz*TokenList.as_ew_allowed.<locals>.<genexpr>)all)rrrrr)szTokenList.as_ew_allowedcCs"g}x|D]}|j|jq
W|S)N)extendcomments)rr-tokenrrrr-s
zTokenList.commentscCst||dS)N)policy)_refold_parse_tree)rr/rrrfoldszTokenList.foldrcCst|j|ddS)N)indent)printppstr)rr2rrrpprintszTokenList.pprintcCsdj|j|dS)Nr)r2)r!_pp)rr2rrrr4szTokenList.ppstrccs~dj||jj|jVx<|D]4}t|ds<|dj|Vq|j|dEdHqW|jrhdj|j}nd}dj||VdS)Nz{}{}/{}(r6z*    !! invalid element in token list: {!r}z    z Defects: {}rz{}){})r#rr$
token_typehasattrr6r)rr2r.Zextrarrrr6s


z
TokenList._pp)r)r)r)r$
__module____qualname__r7syntactic_breakew_combine_allowedrr"r%propertyrr&r(r)r-r1r5r4r6
__classcell__rr)rrris

rc@s$eZdZeddZeddZdS)WhiteSpaceTokenListcCsdS)N r)rrrrrszWhiteSpaceTokenList.valuecCsdd|DS)NcSsg|]}|jdkr|jqS)comment)r7content)rrrrr
<listcomp>sz0WhiteSpaceTokenList.comments.<locals>.<listcomp>r)rrrrr-szWhiteSpaceTokenList.commentsN)r$r9r:r=rr-rrrrr?sr?c@seZdZdZdS)UnstructuredTokenListunstructuredN)r$r9r:r7rrrrrDsrDc@seZdZdZdS)PhrasephraseN)r$r9r:r7rrrrrFsrFc@seZdZdZdS)WordZwordN)r$r9r:r7rrrrrHsrHc@seZdZdZdS)CFWSListcfwsN)r$r9r:r7rrrrrIsrIc@seZdZdZdS)AtomatomN)r$r9r:r7rrrrrKsrKc@seZdZdZdZdS)Tokenr.FN)r$r9r:r7Zencode_as_ewrrrrrMsrMc@seZdZdZdZdZdZdS)EncodedWordzencoded-wordN)r$r9r:r7ctecharsetlangrrrrrNsrNc@s4eZdZdZeddZeddZeddZdS)	QuotedStringz
quoted-stringcCs"x|D]}|jdkr|jSqWdS)Nzbare-quoted-string)r7r)rrrrrrBs

zQuotedString.contentcCsBg}x2|D]*}|jdkr(|jt|q
|j|jq
Wdj|S)Nzbare-quoted-stringr)r7appendrrr!)rresrrrrquoted_values

zQuotedString.quoted_valuecCs"x|D]}|jdkr|jSqWdS)Nzbare-quoted-string)r7r)rr.rrrstripped_values

zQuotedString.stripped_valueN)r$r9r:r7r=rBrUrVrrrrrRs
rRc@s$eZdZdZddZeddZdS)BareQuotedStringzbare-quoted-stringcCstdjdd|DS)Nrcss|]}t|VqdS)N)r)rrrrrr sz+BareQuotedString.__str__.<locals>.<genexpr>)rr!)rrrrr"szBareQuotedString.__str__cCsdjdd|DS)Nrcss|]}t|VqdS)N)r)rrrrrr sz)BareQuotedString.value.<locals>.<genexpr>)r!)rrrrrszBareQuotedString.valueN)r$r9r:r7r"r=rrrrrrWsrWc@s8eZdZdZddZddZeddZedd	Zd
S)CommentrAcs(djtdgfddDdgggS)Nrrcsg|]}j|qSr)quote)rr)rrrrCsz#Comment.__str__.<locals>.<listcomp>))r!r')rr)rrr"s
zComment.__str__cCs2|jdkrt|St|jddjddjddS)NrArz\\rz\(rZz\))r7rr)rrrrrrYs

z
Comment.quotecCsdjdd|DS)Nrcss|]}t|VqdS)N)r)rrrrrr sz"Comment.content.<locals>.<genexpr>)r!)rrrrrBszComment.contentcCs|jgS)N)rB)rrrrr-szComment.commentsN)	r$r9r:r7r"rYr=rBr-rrrrrXs
rXc@s4eZdZdZeddZeddZeddZdS)	AddressListzaddress-listcCsdd|DS)NcSsg|]}|jdkr|qS)address)r7)rrrrrrC$sz)AddressList.addresses.<locals>.<listcomp>r)rrrr	addresses"szAddressList.addressescCstdd|DgS)Ncss|]}|jdkr|jVqdS)r\N)r7	mailboxes)rrrrrr (sz(AddressList.mailboxes.<locals>.<genexpr>)r')rrrrr^&szAddressList.mailboxescCstdd|DgS)Ncss|]}|jdkr|jVqdS)r\N)r7
all_mailboxes)rrrrrr -sz,AddressList.all_mailboxes.<locals>.<genexpr>)r')rrrrr_+szAddressList.all_mailboxesN)r$r9r:r7r=r]r^r_rrrrr[sr[c@s4eZdZdZeddZeddZeddZdS)	Addressr\cCs|djdkr|djSdS)Nrgroup)r7display_name)rrrrrb5szAddress.display_namecCs4|djdkr|dgS|djdkr*gS|djS)Nrmailboxzinvalid-mailbox)r7r^)rrrrr^:s

zAddress.mailboxescCs:|djdkr|dgS|djdkr0|dgS|djS)Nrrczinvalid-mailbox)r7r_)rrrrr_Bs


zAddress.all_mailboxesN)r$r9r:r7r=rbr^r_rrrrr`1sr`c@s(eZdZdZeddZeddZdS)MailboxListzmailbox-listcCsdd|DS)NcSsg|]}|jdkr|qS)rc)r7)rrrrrrCPsz)MailboxList.mailboxes.<locals>.<listcomp>r)rrrrr^NszMailboxList.mailboxescCsdd|DS)NcSsg|]}|jdkr|qS)rcinvalid-mailbox)rcre)r7)rrrrrrCTsz-MailboxList.all_mailboxes.<locals>.<listcomp>r)rrrrr_RszMailboxList.all_mailboxesN)r$r9r:r7r=r^r_rrrrrdJsrdc@s(eZdZdZeddZeddZdS)	GroupListz
group-listcCs"|s|djdkrgS|djS)Nrzmailbox-list)r7r^)rrrrr^\szGroupList.mailboxescCs"|s|djdkrgS|djS)Nrzmailbox-list)r7r_)rrrrr_bszGroupList.all_mailboxesN)r$r9r:r7r=r^r_rrrrrfXsrfc@s4eZdZdZeddZeddZeddZdS)	GroupracCs|djdkrgS|djS)Nz
group-list)r7r^)rrrrr^mszGroup.mailboxescCs|djdkrgS|djS)Nrhz
group-list)r7r_)rrrrr_sszGroup.all_mailboxescCs
|djS)Nr)rb)rrrrrbyszGroup.display_nameN)r$r9r:r7r=r^r_rbrrrrrgisrgc@sLeZdZdZeddZeddZeddZedd	Zed
dZ	dS)
NameAddrz	name-addrcCst|dkrdS|djS)Nr)lenrb)rrrrrbszNameAddr.display_namecCs
|djS)Nrj)
local_part)rrrrrmszNameAddr.local_partcCs
|djS)Nrjrl)domain)rrrrrnszNameAddr.domaincCs
|djS)Nrjrl)route)rrrrroszNameAddr.routecCs
|djS)Nrjrl)	addr_spec)rrrrrpszNameAddr.addr_specN)
r$r9r:r7r=rbrmrnrorprrrrri~sric@s@eZdZdZeddZeddZeddZedd	Zd
S)	AngleAddrz
angle-addrcCs"x|D]}|jdkr|jSqWdS)Nz	addr-spec)r7rm)rrrrrrms

zAngleAddr.local_partcCs"x|D]}|jdkr|jSqWdS)Nz	addr-spec)r7rn)rrrrrrns

zAngleAddr.domaincCs"x|D]}|jdkr|jSqWdS)Nz	obs-route)r7domains)rrrrrros

zAngleAddr.routecCs<x6|D]*}|jdkr|jr |jSt|j|jSqWdSdS)Nz	addr-specz<>)r7rmrpr)rrrrrrps

zAngleAddr.addr_specN)	r$r9r:r7r=rmrnrorprrrrrqs
rqc@seZdZdZeddZdS)ObsRoutez	obs-routecCsdd|DS)NcSsg|]}|jdkr|jqS)rn)r7rn)rrrrrrCsz$ObsRoute.domains.<locals>.<listcomp>r)rrrrrrszObsRoute.domainsN)r$r9r:r7r=rrrrrrrssrsc@sLeZdZdZeddZeddZeddZedd	Zed
dZ	dS)
MailboxrccCs|djdkr|djSdS)Nrz	name-addr)r7rb)rrrrrbszMailbox.display_namecCs
|djS)Nr)rm)rrrrrmszMailbox.local_partcCs
|djS)Nr)rn)rrrrrnszMailbox.domaincCs|djdkr|djSdS)Nrz	name-addr)r7ro)rrrrrosz
Mailbox.routecCs
|djS)Nr)rp)rrrrrpszMailbox.addr_specN)
r$r9r:r7r=rbrmrnrorprrrrrtsrtc@s,eZdZdZeddZeZZZZ	dS)InvalidMailboxzinvalid-mailboxcCsdS)Nr)rrrrrbszInvalidMailbox.display_nameN)
r$r9r:r7r=rbrmrnrorprrrrrusrucs(eZdZdZdZefddZZS)DomainrnFcsdjtjjS)Nr)r!rrsplit)r)rrrrnsz
Domain.domain)r$r9r:r7r)r=rnr>rr)rrrvsrvc@seZdZdZdS)DotAtomzdot-atomN)r$r9r:r7rrrrrxsrxc@seZdZdZdZdS)DotAtomTextz
dot-atom-textTN)r$r9r:r7r)rrrrrysryc@sDeZdZdZdZeddZeddZeddZed	d
Z	dS)AddrSpecz	addr-specFcCs
|djS)Nr)rm)rrrrrmszAddrSpec.local_partcCst|dkrdS|djS)Nrjrl)rkrn)rrrrrnszAddrSpec.domaincCs<t|dkr|djS|djj|dj|djjS)Nr{rrjrh)rkrrstriplstrip)rrrrrs
zAddrSpec.valuecCsLt|j}t|t|tkr*t|j}n|j}|jdk	rH|d|jS|S)N@)setrmrk
DOT_ATOM_ENDSrrn)rZnamesetZlprrrrps

zAddrSpec.addr_specN)
r$r9r:r7r)r=rmrnrrprrrrrzsrzc@seZdZdZdZdS)ObsLocalPartzobs-local-partFN)r$r9r:r7r)rrrrr srcs4eZdZdZdZeddZefddZZS)DisplayNamezdisplay-nameFcCst|}|djdkr"|jdn*|ddjdkrLt|ddd|d<|djdkrd|jn*|ddjdkrt|ddd|d	<|jS)
NrrJrjrlrlrlrlrlrl)rr7popr)rrTrrrrb+s
zDisplayName.display_namecsd}|jrd}nx|D]}|jdkrd}qW|rd}}|djdksX|ddjdkr\d}|d	jdks||d
djdkrd}|t|j|StjSdS)NFTz
quoted-stringrrrJr@rjrlrlrl)rr7rrbrr)rrYrZpreZpost)rrrr:s

  zDisplayName.value)	r$r9r:r7r<r=rbrr>rr)rrr&src@s,eZdZdZdZeddZeddZdS)	LocalPartz
local-partFcCs&|djdkr|djS|djSdS)Nrz
quoted-string)r7rUr)rrrrrSs
zLocalPart.valuecCstg}t}d}x|dtgD]}|jdkr.q|r^|jdkr^|djdkr^t|dd|d<t|t}|r|jdkr|djdkr|jt|ddn
|j||d	}|}qWt|dd
}|jS)NFrrJdotrjrlrlrlrlrl)DOTr7r
isinstancerSr)rrTZlastZ
last_is_tltokZis_tlrrrrmZs$


zLocalPart.local_partN)r$r9r:r7r)r=rrmrrrrrNsrcs4eZdZdZdZefddZeddZZS)
DomainLiteralzdomain-literalFcsdjtjjS)Nr)r!rrrw)r)rrrrnwszDomainLiteral.domaincCs"x|D]}|jdkr|jSqWdS)Nptext)r7r)rrrrrip{s

zDomainLiteral.ip)	r$r9r:r7r)r=rnrr>rr)rrrrsrc@seZdZdZdZdZdS)MIMEVersionzmime-versionN)r$r9r:r7majorminorrrrrrsrc@s4eZdZdZdZdZdZeddZeddZ	dS)		Parameter	parameterFzus-asciicCs|jr|djSdS)Nrjr)	sectionednumber)rrrrsection_numberszParameter.section_numbercCsbx\|D]T}|jdkr|jS|jdkrx4|D],}|jdkr*x|D]}|jdkr>|jSq>Wq*WqWdS)Nrz
quoted-stringzbare-quoted-stringr)r7rV)rr.rrrparam_values






zParameter.param_valueN)
r$r9r:r7rextendedrPr=rrrrrrrsrc@seZdZdZdS)InvalidParameterzinvalid-parameterN)r$r9r:r7rrrrrsrc@seZdZdZeddZdS)	Attribute	attributecCs$x|D]}|jjdr|jSqWdS)Nattrtext)r7endswithr)rr.rrrrVs
zAttribute.stripped_valueN)r$r9r:r7r=rVrrrrrsrc@seZdZdZdZdS)SectionsectionN)r$r9r:r7rrrrrrsrc@seZdZdZeddZdS)ValuercCs2|d}|jdkr|d}|jjdr,|jS|jS)NrrJrj
quoted-stringrextended-attribute)rrr)r7rrVr)rr.rrrrVs
zValue.stripped_valueN)r$r9r:r7r=rVrrrrrsrc@s(eZdZdZdZeddZddZdS)MimeParameterszmime-parametersFccst}x\|D]T}|jjdsq|djdkr.q|djj}||krLg||<||j|j|fqWx|jD]\}}t|t	dd}|dd}|j
}|jrt|dkr|dddkr|ddj
jtjd|dd}g}d}x|D]\}	}
|	|kr6|
js$|
j
jtjdqn|
j
jtjd|d7}|
j}|
jrytjj|}Wn&tk
rtjj|d	d
}YnRXy|j|d}Wn"tk
r|jdd}YnXtj|r|
j
jtj|j|qWd
j|}||fVqpWdS)Nrrr)keyrjz.duplicate parameter name; duplicate(s) ignoredz+duplicate parameter name; duplicate ignoredz(inconsistent RFC2231 parameter numberingzlatin-1)encodingsurrogateescapezus-asciir)rr7rrstriprSritemssortedrrPrrkrrInvalidHeaderDefectrurllibparseZunquote_to_bytesUnicodeEncodeErrorZunquotedecodeLookupErrorr_has_surrogatesUndecodableBytesDefectr!)rparamsr.namepartsZfirst_paramrPZvalue_partsirparamrrrrrsZ




zMimeParameters.paramscCsXg}x8|jD].\}}|r0|jdj|t|q|j|qWdj|}|rTd|SdS)Nz{}={}z; r@r)rrSr#rr!)rrrrrrrr"s
zMimeParameters.__str__N)r$r9r:r7r;r=rr"rrrrrsFrc@seZdZdZeddZdS)ParameterizedHeaderValueFcCs&x t|D]}|jdkr
|jSq
WiS)Nzmime-parameters)reversedr7r)rr.rrrr%s

zParameterizedHeaderValue.paramsN)r$r9r:r;r=rrrrrrsrc@seZdZdZdZdZdZdS)ContentTypezcontent-typeFtextZplainN)r$r9r:r7r)maintypesubtyperrrrr-src@seZdZdZdZdZdS)ContentDispositionzcontent-dispositionFN)r$r9r:r7r)content_dispositionrrrrr5src@seZdZdZdZdZdS)ContentTransferEncodingzcontent-transfer-encodingFZ7bitN)r$r9r:r7r)rOrrrrr<src@seZdZdZdZdS)HeaderLabelzheader-labelFN)r$r9r:r7r)rrrrrCsrc@seZdZdZdS)HeaderheaderN)r$r9r:r7rrrrrIsrcsreZdZdZdZdZfddZfddZddZe	dd	Z
dfdd	Zd
dZe	ddZ
ddZZS)TerminalTcstj||}||_g|_|S)N)r__new__r7r)clsrr7r)rrrrXszTerminal.__new__csdj|jjtjS)Nz{}({}))r#rr$rr%)r)rrrr%^szTerminal.__repr__cCst|jjd|jdS)N/)r3rr$r7)rrrrr5aszTerminal.pprintcCs
t|jS)N)listr)rrrrr&dszTerminal.all_defectsrcs2dj||jj|jtj|js"dn
dj|jgS)Nz
{}{}/{}({}){}rz {})r#rr$r7rr%r)rr2)rrrr6hszTerminal._ppcCsdS)Nr)rrrrpop_trailing_wsqszTerminal.pop_trailing_wscCsgS)Nr)rrrrr-uszTerminal.commentscCst||jfS)N)rr7)rrrr__getnewargs__yszTerminal.__getnewargs__)r)r$r9r:r)r<r;rr%r5r=r&r6rr-rr>rr)rrrRs	rc@s eZdZeddZddZdS)WhiteSpaceTerminalcCsdS)Nr@r)rrrrrszWhiteSpaceTerminal.valuecCsdS)NTr)rrrrr(sz!WhiteSpaceTerminal.startswith_fwsN)r$r9r:r=rr(rrrrr}src@s eZdZeddZddZdS)
ValueTerminalcCs|S)Nr)rrrrrszValueTerminal.valuecCsdS)NFr)rrrrr(szValueTerminal.startswith_fwsN)r$r9r:r=rr(rrrrrsrc@s eZdZeddZddZdS)EWWhiteSpaceTerminalcCsdS)Nrr)rrrrrszEWWhiteSpaceTerminal.valuecCsdS)Nrr)rrrrr"szEWWhiteSpaceTerminal.__str__N)r$r9r:r=rr"rrrrrsrr,zlist-separatorr~zroute-component-markerz([{}]+)rz[^{}]+rz\\]z\]z[\x00-\x20\x7F]cCs>t|}|r|jjtj|tj|r:|jjtjddS)z@If input token contains ASCII non-printables, register a defect.z*Non-ASCII characters found in header tokenN)_non_printable_finderrrSrZNonPrintableDefectrrr)xtextZnon_printablesrrr_validate_xtexts

rcCst|d^}}g}d}d}xbtt|D]J}||dkrL|rFd}d}nd}q(|rVd}n|||krdP|j||q(W|d}dj|dj||dg||fS)akScan printables/quoted-pairs until endchars and return unquoted ptext.

    This function turns a run of qcontent, ccontent-without-comments, or
    dtext-with-quoted-printables into a single string by unquoting any
    quoted printables.  It returns the string, the remaining value, and
    a flag that is True iff there were any quoted printables decoded.

    rjFrTrN)
_wsp_splitterrangerkrSr!)rendcharsZfragment	remainderZvcharsescapehad_qpposrrr_get_ptext_to_endcharss$	rcCs.|j}t|dt|t|d}||fS)zFWS = 1*WSP

    This isn't the RFC definition.  We're using fws to represent tokens where
    folding can be done, but when we are parsing the *un*folding has already
    been done so we don't need to watch out for CRLF.

    Nfws)r}rrk)rZnewvaluerrrrget_fwssrc
Cst}|jds tjdj||ddjdd^}}||ddkrXtjdj|dj|}t|dkr|dtkr|dtkr|jdd^}}|d|}t|jdkr|j	j
tjd	||_dj|}yt
jd|d\}}}}	Wn(tk
rtjd
j|jYnX||_||_|j	j|	xh|r|dtkrdt|\}
}|j
|
q6t|d^}}t|d}t||j
|dj|}q6W||fS)zE encoded-word = "=?" charset "?" encoding "?" encoded-text "?="

    z=?z"expected encoded word but found {}rhNz?=rjrrzwhitespace inside encoded wordz!encoded word format invalid: '{}'vtext)rN
startswithrHeaderParseErrorr#rwr!rkrrrSrrO_ewr
ValueErrorrPrQr,WSPrrrr)
rewrrZremstrrestrrPrQrr.charsrrrrget_encoded_wordsH

$




rcCst}x|r|dtkr4t|\}}|j|q
|jdryt|\}}Wntjk
rdYnrXd}t|dkr|dj	dkr|j
jtjdd}|rt|dkr|d
j	d	krt|dd|d<|j|q
t
|d^}}t|d
}t||j|dj|}q
W|S)aOunstructured = (*([FWS] vchar) *WSP) / obs-unstruct
       obs-unstruct = *((*LF *CR *(obs-utext) *LF *CR)) / FWS)
       obs-utext = %d0 / obs-NO-WS-CTL / LF / CR

       obs-NO-WS-CTL is control characters except WSP/CR/LF.

    So, basically, we have printable runs, plus control characters or nulls in
    the obsolete syntax, separated by whitespace.  Since RFC 2047 uses the
    obsolete syntax in its specification, but requires whitespace on either
    side of the encoded words, I can see no reason to need to separate the
    non-printable-non-whitespace from the printable runs if they occur, so we
    parse this into xtext tokens separated by WSP tokens.

    Because an 'unstructured' value must by definition constitute the entire
    value, this 'get' routine does not return a remaining value, only the
    parsed TokenList.

    rz=?Trjrz&missing whitespace before encoded wordFrhzencoded-wordrrrlrlrl)rDrrrSrrrrrkr7rrrrrrr!)rrEr.Zhave_wsrrrrrrget_unstructured!s:






rcCs*t|d\}}}t|d}t|||fS)actext = <printable ascii except \ ( )>

    This is not the RFC ctext, since we are handling nested comments in comment
    and unquoting quoted-pairs here.  We allow anything except the '()'
    characters, but if we find any ASCII other than the RFC defined printable
    ASCII, a NonPrintableDefect is added to the token's defects list.  Since
    quoted pairs are converted to their unquoted values, what is returned is
    a 'ptext' token.  In this case it is a WhiteSpaceTerminal, so it's value
    is ' '.

    z()r)rrr)rr_rrrget_qp_ctextYs
rcCs*t|d\}}}t|d}t|||fS)aoqcontent = qtext / quoted-pair

    We allow anything except the DQUOTE character, but if we find any ASCII
    other than the RFC defined printable ASCII, a NonPrintableDefect is
    added to the token's defects list.  Any quoted pairs are converted to their
    unquoted values, so what is returned is a 'ptext' token.  In this case it
    is a ValueTerminal.

    r
r)rrr)rrrrrrget_qcontentjs

rcCsNt|}|stjdj||j}|t|d}t|d}t|||fS)zatext = <matches _atext_matcher>

    We allow any non-ATOM_ENDS in atext, but add an InvalidATextDefect to
    the token's defects list if we find non-atext characters.
    zexpected atext but found '{}'Natext)_non_atom_end_matcherrrr#rarkrr)rmrrrr	get_atextys
rcCs|ddkrtjdj|t}|dd}|ddkrPt|\}}|j|x|r|ddkr|dtkr|t|\}}nd|dddkry"t|\}}|j	jtj
dWqtjk
rt|\}}YqXnt|\}}|j|qRW|s|j	jtj
d	||fS||ddfS)
zbare-quoted-string = DQUOTE *([FWS] qcontent) [FWS] DQUOTE

    A quoted-string without the leading or trailing white space.  Its
    value is the text between the quote marks, with whitespace
    preserved and quoted pairs decoded.
    rr
zexpected '"' but found '{}'rjNrhz=?z!encoded word inside quoted stringz"end of header inside quoted string)rrr#rWrrSrrrrr)rZbare_quoted_stringr.rrrget_bare_quoted_strings2


rcCs|r |ddkr tjdj|t}|dd}x^|r|ddkr|dtkr^t|\}}n&|ddkrxt|\}}nt|\}}|j|q4W|s|j	jtj
d||fS||ddfS)zcomment = "(" *([FWS] ccontent) [FWS] ")"
       ccontent = ctext / quoted-pair / comment

    We handle nested comments here, and quoted-pair in our qp-ctext routine.
    rrzexpected '(' but found '{}'rjNrZzend of header inside comment)rrr#rXrrget_commentrrSrr)rrAr.rrrrs"
rcCsTt}xD|rJ|dtkrJ|dtkr2t|\}}nt|\}}|j|qW||fS)z,CFWS = (1*([FWS] comment) [FWS]) / FWS

    r)rICFWS_LEADERrrrrS)rrJr.rrrget_cfwssrcCspt}|r,|dtkr,t|\}}|j|t|\}}|j||rh|dtkrht|\}}|j|||fS)zquoted-string = [CFWS] <bare-quoted-string> [CFWS]

    'bare-quoted-string' is an intermediate class defined by this
    parser and not by the RFC grammar.  It is the quoted string
    without any attached CFWS.
    r)rRrrrSr)rZ
quoted_stringr.rrrget_quoted_strings


rcCst}|r,|dtkr,t|\}}|j||rL|dtkrLtjdj||jdryt	|\}}Wqtjk
rt
|\}}YqXnt
|\}}|j||r|dtkrt|\}}|j|||fS)zPatom = [CFWS] 1*atext [CFWS]

    An atom could be an rfc2047 encoded word.
    rzexpected atom but found '{}'z=?)rKrrrS	ATOM_ENDSrrr#rrr)rrLr.rrrget_atoms$



rcCst}|s|dtkr(tjdj|xP|rx|dtkrxt|\}}|j||r*|ddkr*|jt|dd}q*W|dtkrtjdjd|||fS)z( dot-text = 1*atext *("." 1*atext)

    rz8expected atom at a start of dot-atom-text but found '{}'r	rjNz4expected atom at end of dot-atom-text but found '{}'rl)ryrrrr#rrSr)rZ
dot_atom_textr.rrrget_dot_atom_texts

rcCst}|dtkr(t|\}}|j||jdrhyt|\}}Wqttjk
rdt|\}}YqtXnt|\}}|j||r|dtkrt|\}}|j|||fS)z dot-atom = [CFWS] dot-atom-text [CFWS]

    Any place we can have a dot atom, we could instead have an rfc2047 encoded
    word.
    rz=?)	rxrrrSrrrrr)rZdot_atomr.rrrget_dot_atoms



rcCs|dtkrt|\}}nd}|ddkr8t|\}}n*|dtkrVtjdj|nt|\}}|dk	rx|g|dd<||fS)aword = atom / quoted-string

    Either atom or quoted-string may start with CFWS.  We have to peel off this
    CFWS first to determine which type of word to parse.  Afterward we splice
    the leading CFWS, if any, into the parsed sub-token.

    If neither an atom or a quoted-string is found before the next special, a
    HeaderParseError is raised.

    The token returned is either an Atom or a QuotedString, as appropriate.
    This means the 'word' level of the formal grammar is not represented in the
    parse tree; this is because having that extra layer when manipulating the
    parse tree is more confusing than it is helpful.

    rNr
z1Expected 'atom' or 'quoted-string' but found '{}')rrrSPECIALSrrr#r)rleaderr.rrrget_word*s
rcCst}yt|\}}|j|Wn(tjk
rH|jjtjdYnXx|r|dtkr|ddkr|jt|jjtj	d|dd}qLyt|\}}WnDtjk
r|dt
krt|\}}|jjtj	dnYnX|j|qLW||fS)a phrase = 1*word / obs-phrase
        obs-phrase = word *(word / "." / CFWS)

    This means a phrase can be a sequence of words, periods, and CFWS in any
    order as long as it starts with at least one word.  If anything other than
    words is detected, an ObsoleteHeaderDefect is added to the token's defect
    list.  We also accept a phrase that starts with CFWS followed by a dot;
    this is registered as an InvalidHeaderDefect, since it is not supported by
    even the obsolete grammar.

    zphrase does not start with wordrr	zperiod in 'phrase'rjNzcomment found without atom)rFrrSrrrrPHRASE_ENDSrObsoleteHeaderDefectrr)rrGr.rrr
get_phraseIs.




rcCstt}d}|dtkr"t|\}}|s6tjdj|yt|\}}Wn^tjk
ryt|\}}Wn6tjk
r|ddkr|dtkrt	}YnXYnX|dk	r|g|dd<|j
||o|ddks|dtkr2tt||\}}|j
dkr|jj
tjdn|jj
tjd||d<y|jjdWn(tk
rj|jj
tjd	YnX||fS)
z= local-part = dot-atom / quoted-string / obs-local-part

    Nrz"expected local-part but found '{}'rzinvalid-obs-local-partz<local-part is not dot-atom, quoted-string, or obs-local-partz,local-part is not a dot-atom (contains CFWS)asciiz)local-part contains non-ASCII characters))rrrrrr#rrrrrSget_obs_local_partrr7rrrrencoderZNonASCIILocalPartDefect)rrmrr.obs_local_partrrrget_local_partosB




rcCst}d}x|o(|ddks,|dtkr*|ddkrl|rN|jjtjd|jtd}|dd}qnD|ddkr|jt|dd	|dd}|jjtjd
d}q|r|djdkr|jjtjdyt	|\}}d}Wn4tj
k
r|dtkrt|\}}YnX|j|qW|djdks\|djd
krn|djdkrn|jjtjd|djdks|djd
kr|djdkr|jjtjd|jrd|_||fS)z' obs-local-part = word *("." word)
    Frrr	zinvalid repeated '.'TrjNzmisplaced-specialz/'\' character outside of quoted-string/ccontentrzmissing '.' between wordsrJz!Invalid leading '.' in local partrhz"Invalid trailing '.' in local partzinvalid-obs-local-partrlrlrlr)
rrrrSrrrrr7rrrr)rrZlast_non_ws_was_dotr.rrrrsV"





rcCs@t|d\}}}t|d}|r0|jjtjdt|||fS)a dtext = <printable ascii except \ [ ]> / obs-dtext
        obs-dtext = obs-NO-WS-CTL / quoted-pair

    We allow anything except the excluded characters, but if we find any
    ASCII other than the RFC defined printable ASCII, a NonPrintableDefect is
    added to the token's defects list.  Quoted pairs are converted to their
    unquoted values, so what is returned is a ptext token, in this case a
    ValueTerminal.  If there were quoted-printables, an ObsoleteHeaderDefect is
    added to the returned token's defect list.

    z[]rz(quoted printable found in domain-literal)rrrrSrrr)rrrrrr	get_dtexts

rcCs,|rdS|jtjd|jtdddS)NFz"end of input inside domain-literalrzdomain-literal-endT)rSrrr)rdomain_literalrrr_check_for_early_dl_endsrcCslt}|dtkr(t|\}}|j||s6tjd|ddkrRtjdj||dd}t||rp||fS|jtdd|dt	krt
|\}}|j|t|\}}|j|t||r||fS|dt	krt
|\}}|j|t||r||fS|ddkrtjd	j||jtdd
|dd}|rd|dtkrdt|\}}|j|||fS)zB domain-literal = [CFWS] "[" *([FWS] dtext) [FWS] "]" [CFWS]

    rzexpected domain-literal[z6expected '[' at start of domain-literal but found '{}'rjNzdomain-literal-startrz4expected ']' at end of domain-literal but found '{}'zdomain-literal-end)rrrrSrrr#rrrrr)rrr.rrrget_domain_literalsD







rcCstt}d}|dtkr"t|\}}|s6tjdj||ddkrvt|\}}|dk	rd|g|dd<|j|||fSyt|\}}Wn"tjk
rt	|\}}YnX|r|ddkrtjd|dk	r|g|dd<|j||o|ddkrl|j
jtjd|djd	kr(|d|dd<xB|rj|ddkrj|jt
t	|d
d\}}|j|q*W||fS)z] domain = dot-atom / domain-literal / obs-domain
        obs-domain = atom *("." atom))

    Nrzexpected domain but found '{}'rr~zInvalid Domainr	z(domain is not a dot-atom (contains CFWS)zdot-atomrj)rvrrrrr#rrSrrrrr7r)rrnrr.rrr
get_domains@




rcCs~t}t|\}}|j||s.|ddkrH|jjtjd||fS|jtddt|dd\}}|j|||fS)z( addr-spec = local-part "@" domain

    rr~z"add-spec local part with no domainzaddress-at-symbolrjN)rzrrSrrrrr)rrpr.rrr
get_addr_spec.s


rcCst}xf|rl|ddks$|dtkrl|dtkrHt|\}}|j|q|ddkr|jt|dd}qW|s|ddkrtjdj||jtt	|dd\}}|j|x|o|ddkrB|jt|dd}|sP|dtkrt|\}}|j||ddkr|jtt	|dd\}}|j|qW|sTtjd|ddkrrtjd	j||jt
dd
||ddfS)z obs-route = obs-domain-list ":"
        obs-domain-list = *(CFWS / ",") "@" domain *("," [CFWS] ["@" domain])

        Returns an obs-route token with the appropriate sub-tokens (that is,
        there is no obs-domain-list in the parse tree).
    rrrjNr~z(expected obs-route domain but found '{}'z%end of header while parsing obs-route:z4expected ':' marking end of obs-route but found '{}'zend-of-obs-route-marker)rsrrrS
ListSeparatorrrr#RouteComponentMarkerrr)rZ	obs_router.rrr
get_obs_route>sB






r
cCst}|dtkr(t|\}}|j||s:|ddkrJtjdj||jtdd|dd}|ddkr|jtdd|jjtj	d	|dd}||fSyt
|\}}Wnztjk
r2y"t|\}}|jjtjd
Wn(tjk
rtjdj|YnX|j|t
|\}}YnX|j||r`|ddkr`|dd}n|jjtj	d|jtdd|r|dtkrt|\}}|j|||fS)
z angle-addr = [CFWS] "<" addr-spec ">" [CFWS] / obs-angle-addr
        obs-angle-addr = [CFWS] "<" obs-route addr-spec ">" [CFWS]

    r<z"expected angle-addr but found '{}'zangle-addr-startrjN>zangle-addr-endznull addr-spec in angle-addrz*obsolete route specification in angle-addrz.expected addr-spec or obs-route but found '{}'z"missing trailing '>' on angle-addr)
rqrrrSrrr#rrrrr
r)rZ
angle_addrr.rrrget_angle_addrgsJ






r
cCs<t}t|\}}|j|dd|jdd|_||fS)z display-name = phrase

    Because this is simply a name-rule, we don't return a display-name
    token containing a phrase, but rather a display-name token with
    the content of the phrase.

    N)rrr,r)rrbr.rrrget_display_names
rcCst}d}|dtkr6t|\}}|s6tjdj||ddkr|dtkr^tjdj|t|\}}|s~tjdj||dk	r|g|ddd<d}|j|t	|\}}|dk	r|g|dd<|j|||fS)z, name-addr = [display-name] angle-addr

    Nrz!expected name-addr but found '{}'r)
rirrrrr#rrrSr
)rZ	name_addrrr.rrr
get_name_addrs0

rcCst}yt|\}}WnNtjk
rdyt|\}}Wn&tjk
r^tjdj|YnXYnXtdd|jDrd|_|j	|||fS)z& mailbox = name-addr / addr-spec

    zexpected mailbox but found '{}'css|]}t|tjVqdS)N)rrr)rrrrrr szget_mailbox.<locals>.<genexpr>zinvalid-mailbox)
rtrrrrr#anyr&r7rS)rrcr.rrrget_mailboxs
rcCsht}xX|r^|d|kr^|dtkrF|jt|dd|dd}qt|\}}|j|qW||fS)z Read everything up to one of the chars in endchars.

    This is outside the formal grammar.  The InvalidMailbox TokenList that is
    returned acts like a Mailbox, but the data attributes are None.

    rzmisplaced-specialrjN)rurrSrr)rrZinvalid_mailboxr.rrrget_invalid_mailboxsrcCst}x|o|ddkryt|\}}|j|Wntjk
r@d}|dtkrt|\}}|sz|ddkr|j||jjtjdn@t	|d\}}|dk	r|g|dd<|j||jjtj
dnb|ddkr|jjtjdnBt	|d\}}|dk	r |g|dd<|j||jjtj
dYnX|r|ddkr|d
}d	|_t	|d\}}|j||jjtj
d|r
|ddkr
|jt
|dd}q
W||fS)aJ mailbox-list = (mailbox *("," mailbox)) / obs-mbox-list
        obs-mbox-list = *([CFWS] ",") mailbox *("," [mailbox / CFWS])

    For this routine we go outside the formal grammar in order to improve error
    handling.  We recognize the end of the mailbox list only at the end of the
    value or at a ';' (the group terminator).  This is so that we can turn
    invalid mailboxes into InvalidMailbox tokens and continue parsing any
    remaining valid mailboxes.  We also allow all mailbox entries to be null,
    and this condition is handled appropriately at a higher level.

    r;Nz,;zempty element in mailbox-listzinvalid mailbox in mailbox-listrrjzinvalid-mailboxrl)rdrrSrrrrrrrrr7r,r)rZmailbox_listr.rrcrrrget_mailbox_listsN













rcCst}|s$|jjtjd||fSd}|r|dtkrt|\}}|sl|jjtjd|j|||fS|ddkr|j|||fSt|\}}t|j	dkr|dk	r|j||j
||jjtjd||fS|dk	r|g|dd<|j|||fS)zg group-list = mailbox-list / CFWS / obs-group-list
        obs-group-list = 1*([CFWS] ",") [CFWS]

    zend of header before group-listNrzend of header in group-listrzgroup-list with empty entries)rfrrSrrrrrrkr_r,r)rZ
group_listrr.rrrget_group_list"s8







rcCs"t}t|\}}|s$|ddkr4tjdj||j||jtdd|dd}|r|ddkr|jtdd||ddfSt|\}}|j||s|jjtj	d	n|ddkrtjd
j||jtdd|dd}|r|dt
krt|\}}|j|||fS)z7 group = display-name ":" [group-list] ";" [CFWS]

    rrz8expected ':' at end of group display name but found '{}'zgroup-display-name-terminatorrjNrzgroup-terminatorzend of header in groupz)expected ';' at end of group but found {})rgrrrr#rSrrrrrr)rrar.rrr	get_groupGs2




rcCsxt}yt|\}}WnNtjk
rdyt|\}}Wn&tjk
r^tjdj|YnXYnX|j|||fS)a address = mailbox / group

    Note that counter-intuitively, an address can be either a single address or
    a list of addresses (a group).  This is why the returned Address object has
    a 'mailboxes' attribute which treats a single address as a list of length
    one.  When you need to differentiate between to two cases, extract the single
    element, which is either a mailbox or a group token.

    zexpected address but found '{}')r`rrrrr#rS)rr\r.rrrget_addresses
rcCst}x|ryt|\}}|j|Wn$tjk
rP}zd}|dtkrt|\}}|sr|ddkr|j||jjtjdnFt	|d\}}|dk	r|g|dd<|jt
|g|jjtjdnh|ddkr|jjtjdnHt	|d\}}|dk	r|g|dd<|jt
|g|jjtjdWYdd}~XnX|r|ddkr|d
d}d|_t	|d\}}|j
||jjtjd|r
|jtdd	|dd}q
W||fS)a address_list = (address *("," address)) / obs-addr-list
        obs-addr-list = *([CFWS] ",") address *("," [address / CFWS])

    We depart from the formal grammar here by continuing to parse until the end
    of the input, assuming the input to be entirely composed of an
    address-list.  This is always true in email parsing, and allows us
    to skip invalid addresses to parse additional valid ones.

    Nrrz"address-list entry with no contentzinvalid address in address-listzempty element in address-listrjzinvalid-mailboxzlist-separatorrl)r[rrSrrrrrrrr`rr7r,r)rZaddress_listr.errrrcrrrget_address_listsN












rcCst}|s |jjtjd|S|dtkrXt|\}}|j||sX|jjtjdd}x8|r|ddkr|dtkr||d7}|dd}q^W|js|jjtjdj	||jt
|d	nt||_|jt
|d
|o|dtkr
t|\}}|j||s |ddkrX|jdk	r>|jjtjd|rT|jt
|d	|S|jt
dd|dd}|r|dtkrt|\}}|j||s|jdk	r|jjtjd|Sd}x2|r|dtkr||d7}|dd}qW|js2|jjtjd
j	||jt
|d	nt||_
|jt
|d
|rv|dtkrvt|\}}|j||r|jjtjd|jt
|d	|S)zE mime-version = [CFWS] 1*digit [CFWS] "." [CFWS] 1*digit [CFWS]

    z%Missing MIME version number (eg: 1.0)rz0Expected MIME version number but found only CFWSrr	rjNz1Expected MIME major version number but found {!r}rdigitsz0Incomplete MIME version; found only major numberzversion-separatorz1Expected MIME minor version number but found {!r}z'Excess non-CFWS text after MIME version)rrrSrHeaderMissingRequiredValuerrisdigitrr#rintrr)rZmime_versionr.rrrrparse_mime_versionsv













rcCsht}xX|r^|ddkr^|dtkrF|jt|dd|dd}qt|\}}|j|qW||fS)z Read everything up to the next ';'.

    This is outside the formal grammar.  The InvalidParameter TokenList that is
    returned acts like a Parameter, but the data attributes are None.

    rrzmisplaced-specialrjN)rrrSrr)rZinvalid_parameterr.rrrget_invalid_parametersrcCsNt|}|stjdj||j}|t|d}t|d}t|||fS)a8ttext = <matches _ttext_matcher>

    We allow any non-TOKEN_ENDS in ttext, but add defects to the token's
    defects list if we find non-ttext characters.  We also register defects for
    *any* non-printables even though the RFC doesn't exclude all of them,
    because we follow the spirit of RFC 5322.

    zexpected ttext but found '{}'Nttext)_non_token_end_matcherrrr#rarkrr)rrr rrr	get_ttexts	
r"cCst}|r,|dtkr,t|\}}|j||rL|dtkrLtjdj|t|\}}|j||r|dtkrt|\}}|j|||fS)ztoken = [CFWS] 1*ttext [CFWS]

    The RFC equivalent of ttext is any US-ASCII chars except space, ctls, or
    tspecials.  We also exclude tabs even though the RFC doesn't.

    The RFC implies the CFWS but is not explicit about it in the BNF.

    rzexpected token but found '{}')	rMrrrS
TOKEN_ENDSrrr#r")rZmtokenr.rrr	get_token)s	


r$cCsNt|}|stjdj||j}|t|d}t|d}t|||fS)aQattrtext = 1*(any non-ATTRIBUTE_ENDS character)

    We allow any non-ATTRIBUTE_ENDS in attrtext, but add defects to the
    token's defects list if we find non-attrtext characters.  We also register
    defects for *any* non-printables even though the RFC doesn't exclude all of
    them, because we follow the spirit of RFC 5322.

    z expected attrtext but found {!r}Nr)_non_attribute_end_matcherrrr#rarkrr)rrrrrrget_attrtext@s	
r&cCst}|r,|dtkr,t|\}}|j||rL|dtkrLtjdj|t|\}}|j||r|dtkrt|\}}|j|||fS)aH [CFWS] 1*attrtext [CFWS]

    This version of the BNF makes the CFWS explicit, and as usual we use a
    value terminal for the actual run of characters.  The RFC equivalent of
    attrtext is the token characters, with the subtraction of '*', "'", and '%'.
    We include tab in the excluded set just as we do for token.

    rzexpected token but found '{}')	rrrrSATTRIBUTE_ENDSrrr#r&)rrr.rrr
get_attributeSs	


r(cCsNt|}|stjdj||j}|t|d}t|d}t|||fS)zattrtext = 1*(any non-ATTRIBUTE_ENDS character plus '%')

    This is a special parsing routine so that we get a value that
    includes % escapes as a single string (which we decode as a single
    string later).

    z)expected extended attrtext but found {!r}Nzextended-attrtext)#_non_extended_attribute_end_matcherrrr#rarkrr)rrrrrrget_extended_attrtextjs
r*cCst}|r,|dtkr,t|\}}|j||rL|dtkrLtjdj|t|\}}|j||r|dtkrt|\}}|j|||fS)z [CFWS] 1*extended_attrtext [CFWS]

    This is like the non-extended version except we allow % characters, so that
    we can pick up an encoded value as a single string.

    rzexpected token but found '{}')	rrrrSEXTENDED_ATTRIBUTE_ENDSrrr#r*)rrr.rrrget_extended_attribute|s


r,cCst}|s|ddkr(tjdj||jtdd|dd}|sX|djrhtjdj|d}x,|r|djr||d7}|dd}qnW|dd	kr|d	kr|jjtjd
t	||_
|jt|d||fS)a6 '*' digits

    The formal BNF is more complicated because leading 0s are not allowed.  We
    check for that and add a defect.  We also assume no CFWS is allowed between
    the '*' and the digits, though the RFC is not crystal clear on that.
    The caller should already have dealt with leading CFWS.

    r*zExpected section but found {}zsection-markerrjNz$Expected section number but found {}r0z'section number has an invalid leading 0r)rrrr#rSrrrZInvalidHeaderErrorrr)rrrrrrget_sections&	

r/cCst}|stjdd}|dtkr0t|\}}|sDtjdj||ddkr^t|\}}nt|\}}|dk	r|g|dd<|j|||fS)z  quoted-string / attribute

    z&Expected value but found end of stringNrz Expected value but found only {}r
)	rrrrrr#rr,rS)rvrr.rrr	get_values 

r1cCst}t|\}}|j||s.|ddkrN|jjtjdj|||fS|ddkry t|\}}d|_|j|Wntj	k
rYnX|stj	d|ddkr|jt
dd|dd	}d|_|dd
krtj	d|jt
d
d|dd	}d	}|r.|dtkr.t
|\}}|j|d	}|}|jrH|rH|dd
krHt|\}}|j}d}|jdkr|r|ddkrd}n$t|\}}	|	r|	ddkrd}n(yt|\}}	WnYnX|	sd}|r2|jjtjd|j|x,|D]$}
|
jdkrg|
d	d	<|
}PqW|}nd	}|jjtjd|rb|ddkrbd	}nt|\}}|js|jdkr|s|ddkr|j||d	k	r|st||}||fS|jjtjd|s|jjtjd|j||d	kr||fSn|d	k	rVx|D]}
|
jdkr"Pq"W|
jdk|j|
|
j|_|ddkrttj	dj||jt
dd|dd	}|r|ddkrt|\}}|j||j|_|s|ddkrtj	dj||jt
dd|dd	}|d	k	rZt}x>|rR|dtkr8t|\}}nt|\}}|j|qW|}nt|\}}|j||d	k	r|st||}||fS)aY attribute [section] ["*"] [CFWS] "=" value

    The CFWS is implied by the RFC but not made explicit in the BNF.  This
    simplified form of the BNF from the RFC is made to conform with the RFC BNF
    through some extra checks.  We do it this way because it makes both error
    recovery and working with the resulting parse tree easier.
    rrz)Parameter contains name ({}) but no valuer-TzIncomplete parameterzextended-parameter-markerrjN=zParameter not followed by '='zparameter-separatorr
F'z5Quoted string value for extended parameter is invalidzbare-quoted-stringzZParameter marked as extended but appears to have a quoted string value that is non-encodedzcApparent initial-extended-value but attribute was not marked as extended or was not initial sectionz(Missing required charset/lang delimiterszextended-attrtextrz=Expected RFC2231 char/lang encoding delimiter, but found {!r}zRFC2231-delimiterz;Expected RFC2231 char/lang encoding delimiter, but found {})rr(rSrrrr#r/rrrrrrrrVrr&r*r7r1AssertionErrorrrPrQrrrr)rrr.rrZappendtoZqstringZinner_valueZ
semi_validrtr0rrr
get_parameters























r6cCsht}xZ|rbyt|\}}|j|Wntjk
r}zd}|dtkrZt|\}}|sl|j||S|ddkr|dk	r|j||jjtjdn@t	|\}}|r|g|dd<|j||jjtjdj
|WYdd}~XnX|r@|ddkr@|d
}d|_t	|\}}|j||jjtjdj
||r
|jt
dd	|dd}q
W|S)a! parameter *( ";" parameter )

    That BNF is meant to indicate this routine should only be called after
    finding and handling the leading ';'.  There is no corresponding rule in
    the formal RFC grammar, but it is more convenient for us for the set of
    parameters to be treated as its own TokenList.

    This is 'parse' routine because it consumes the reminaing value, but it
    would never be called to parse a full header.  Instead it is called to
    parse everything after the non-parameter value of a specific MIME header.

    Nrrzparameter entry with no contentzinvalid parameter {!r}rjzinvalid-parameterz)parameter with invalid trailing text {!r}zparameter-separatorrl)rr6rSrrrrrrrr#r7r,r)rZmime_parametersr.rrrrrrparse_mime_parametersQ	sD







 

r7cCsxX|rX|ddkrX|dtkr@|jt|dd|dd}qt|\}}|j|qW|sbdS|jtdd|jt|dddS)zBDo our best to find the parameters in an invalid MIME header

    rrzmisplaced-specialrjNzparameter-separator)rrSrrr7)Z	tokenlistrr.rrr_find_mime_parameters	sr8cCst}d}|s$|jjtjd|Syt|\}}Wn8tjk
rl|jjtjdj|t	|||SX|j||s|ddkr|jjtjd|rt	|||S|j
jj|_
|jtdd|dd	}yt|\}}Wn:tjk
r$|jjtjd
j|t	|||SX|j||j
jj|_|sJ|S|ddkr|jjtjdj||`
|`t	|||S|jtdd
|jt|dd	|S)z maintype "/" subtype *( ";" parameter )

    The maintype and substype are tokens.  Theoretically they could
    be checked against the official IANA list + x-token, but we
    don't do that.
    Fz"Missing content type specificationz(Expected content maintype but found {!r}rrzInvalid content typezcontent-type-separatorrjNz'Expected content subtype but found {!r}rz<Only parameters are valid after content type, but found {!r}zparameter-separator)rrrSrrr$rrr#r8rrlowerrrrr7)rZctypeZrecoverr.rrrparse_content_type_header	sX











r:c
Cst}|s |jjtjd|Syt|\}}Wn8tjk
rh|jjtjdj|t	|||SX|j||j
jj|_
|s|S|ddkr|jjtjdj|t	|||S|jtdd|jt|dd|S)	z* disposition-type *( ";" parameter )

    zMissing content dispositionz+Expected content disposition but found {!r}rrzCOnly parameters are valid after content disposition, but found {!r}zparameter-separatorrjN)rrrSrrr$rrr#r8rrr9rrr7)rZdisp_headerr.rrr parse_content_disposition_header	s2






r;cCst}|s |jjtjd|Syt|\}}Wn.tjk
r^|jjtjdj|YnX|j||j	j
j|_|s|Sx^|r|jjtjd|dt
kr|jt|dd|dd}qt|\}}|j|qW|S)z mechanism

    z!Missing content transfer encodingz1Expected content transfer encoding but found {!r}z*Extra text after content transfer encodingrzmisplaced-specialrjN)rrrSrrr$rrr#rrr9rOrrr)rZ
cte_headerr.rrr&parse_content_transfer_encoding_header	s.



r<cCsDd}|r@|dr@|ddtkr@|dd}|ddd	|d
<|S)Nrrjrlrlrlrlrlrlrlrl)r)linesZwsprrr_steal_trailing_WSP_if_exists
s
r>cCs|jptd}|jrdnd}dg}d}d}d}tdd}t|}	xz|	r|	jd}
|
|krf|d	8}qDt|
}|s|
jd
krtj	|}nt
j	|}y|j||}Wn6tk
rt
dd|
jDrd
}nd}d}YnX|
jdkrt|
|||qD|r|r|
jsd}d}|
jr|
j|ddt|j}
|j|
krt|
|t|dkrtt|}|j||d|
7<qDt|
dst|
|	}	nt|||||
j|}d}qDt||t|dkr|d|7<qD|
jr(t|d	|kr(t|}|s|
jr(|j||qDt|
ds`t|
}|
jsV|d	7}|j|||	}	qD|
jr|r|	jd|
d}qDt|}|s|
jr|j||qD|d|7<qDW|jj||jS)zLReturn string of contents of parse_tree folded according to RFC rules.

    z+infzutf-8zus-asciirNrFwrap_as_ew_blockedrjrcss|]}t|tjVqdS)N)rrr)rrrrrr @
sz%_refold_parse_tree.<locals>.<genexpr>zunknown-8bitTzmime-parameters)r/rrlrlrlrlrl)Zmax_line_lengthfloatutf8rrrrr7
SPECIALSNL
isdisjointNLSETrrrr&_fold_mime_parametersr)r;r1rklinesepr>rSr8_fold_as_ewr<r(insertr!)Z
parse_treer/maxlenrr=last_ewr?Z
want_encodingZend_ew_not_allowedrr*tstrrPZencoded_partnewlineZnewpartsrrrr0"
s










r0cCs|dk	r<|r<tt|d
|d|}|dd||d<|dtkr|d}|dd}t|d
|krz|jt||d|7<d}|dtkr|d}|dd}|dkrt|dn|}x|r|t|d}	|dkrdn|}
|	t|
d}|dkr|jdq|d|}tj||
d	}
t|
|	}|dkr\|d|}tj|}
|d|
7<|t|d}|r|jdt|d}qW|d|7<|r|SdS)aFold string to_encode into lines as encoded word, combining if allowed.
    Return the new value for last_ew, or None if ew_combine_allowed is False.

    If there is already an encoded word in the last line of lines (indicated by
    a non-None value for last_ew) and ew_combine_allowed is true, decode the
    existing ew, combine it with to_encode, and re-encode.  Otherwise, encode
    to_encode.  In either case, split to_encode as necessary so that the
    encoded segments fit within maxlen.

    Nrjrrzus-asciizutf-8r@)rPrlrlrlrlrlrlrlrlrlrlrlrlrl)rrrrkrSr>rr)Z	to_encoder=rIrJr<rPZleading_wspZtrailing_wspZnew_last_ewZremaining_spaceZ	encode_asZ
text_spaceZ
first_partrZexcessrrrrG
sF




rGcCsx|jD]\}}|djjds6|dd7<|}d}y|j|d}Wn0tk
rd}tj|rxd}d}nd}YnX|rtjj	|d	|d
}	dj
|||	}
ndj
|t|}
t|dt|
d|kr|dd
|
|d<q
n"t|
d|kr|j
d
|
q
d}|d}x|rt|tt|dt|}
||
dkrTd}||
d}}x<|d|}tjj	|d	|d
}	t|	|krP|d8}qfW|j
dj
||||	d	}|d7}||d}|r|dd7<qWq
WdS)a>Fold TokenList 'part' into the 'lines' list as mime parameters.

    Using the decoded list of parameters and values, format them according to
    the RFC rules, including using RFC2231 encoding if the value cannot be
    expressed in 'encoding' and/or the parameter+value is too long to fit
    within 'maxlen'.

    rjrstrictFTzunknown-8bitrzutf-8r)Zsaferz
{}*={}''{}z{}={}r@rhrz''r{NNz {}*{}*={}{}rlrlrlrlrlrl)rr|rrrrrrrrYr#rrkrSr)r*r=rIrrrrPZ
error_handlerZencoding_requiredZ
encoded_valuerKrZextra_chromeZ
chrome_lenZ
splitpointZmaxcharspartialrrrrE
s\


 rE)__doc__rerstringrcollectionsroperatorrZemailrrrrrrrrrrrZ	TSPECIALSr#Z	ASPECIALSr'r+rDrBrrrr?rDrFrHrIrKrMrNrRrWrXr[r`rdrfrgrirqrsrtrurvrxryrzrrrrrrrrrrrrrrrrrrrrrrrrr	compiler#r!rwrrmatchrfindallrr!r%r)rrrrrrrrrrrrrrrrrrrrrrrrr
r
rrrrrrrrrrrr"r$r&r(r*r,r/r1r6r7r8r:r;r<r>r0rGrErrrr<module>DsC"	
!($
V	+




   

*8"
&'/'&).9%>D49/j7