3 υ_)@sddlmZmZmZddlmZmZddlmZm Z ddl Z ddl Z ddl m Z ddlmZmZmZmZddlmZdd lmZdd lmZydd lmZWnek reZYnXed d eDZedd eDZedd eDZeeddgBZdZej r(e j!eddFe"ddZ#n e j!eZ#e$dddddddddddd d!d"d#d$d%d&d'd(d)d*d+d,d-d.d/d0d1d2d3d4g Z%e j!d5Z&iZ'Gd6d7d7e(Z)d8d9Z*Gd:d;d;e(Z+Gdd?d?e-Z.Gd@dAdAe(Z/GdBdCdCe(Z0dDdEZ1dS)G)absolute_importdivisionunicode_literals) text_type binary_type) http_clienturllibN) webencodings)EOFspaceCharacters asciiLettersasciiUppercase)ReparseException)_utils)StringIO)BytesIOcCsg|]}|jdqS)ascii)encode).0itemr"/usr/lib/python3.6/_inputstream.py srcCsg|]}|jdqS)r)r)rrrrrrscCsg|]}|jdqS)r)r)rrrrrrs>.)sumr )r"rrr_bufferedBytes^szBufferedStream._bufferedBytescCs<|jj|}|jj||jdd7<t||jd<|S)Nrr )rr.r appendr!r$)r"r-datarrrr+as   zBufferedStream._readStreamcCs|}g}|jd}|jd}x|t|jkr|dkr|j|}|t||krb|}|||g|_n"t||}|t|g|_|d7}|j||||||8}d}qW|r|j|j|dj|S)Nrr )r!r$r r1r+join)r"r-ZremainingBytesrvZ bufferIndexZ bufferOffsetZ bufferedDataZ bytesToReadrrrr,hs$    zBufferedStream._readFromBufferN) __name__ __module__ __qualname____doc__r#r'r*r.r0r+r,rrrrr9s  rcKst|tjs(t|tjjr.t|jtjr.d}n&t|drJt|jdt }n t|t }|rdd|D}|rvt d|t |f|St |f|SdS)NFr.rcSsg|]}|jdr|qS)Z _encoding)endswith)rxrrrrsz#HTMLInputStream..z3Cannot set an encoding with a unicode input, set %r) isinstancerZ HTTPResponserZresponseZaddbasefphasattrr.r TypeErrorHTMLUnicodeInputStreamHTMLBinaryInputStream)sourcekwargsZ isUnicodeZ encodingsrrrHTMLInputStreams     rDc@speZdZdZdZddZddZddZd d Zd d Z d dZ dddZ ddZ ddZ dddZddZdS)r@zProvides a unicode stream of characters to the HTMLTokenizer. This class takes care of character encoding and removing or replacing incorrect byte-sequences and also provides column and line tracking. i(cCsZtjsd|_ntddkr$|j|_n|j|_dg|_tddf|_|j ||_ |j dS)aInitialises the HTMLInputStream. HTMLInputStream(source, [encoding]) -> Normalized stream from source for use by html5lib. source can be either a file-object, local filename or a string. The optional encoding parameter must be a string that indicates the encoding. If specified, that encoding will be used, regardless of any BOM or later declaration (such as in a meta element) Nu􏿿r rzutf-8certain) rsupports_lone_surrogatesreportCharacterErrorsr$characterErrorsUCS4characterErrorsUCS2ZnewLineslookupEncoding charEncoding openStream dataStreamreset)r"rBrrrr#s   zHTMLUnicodeInputStream.__init__cCs.d|_d|_d|_g|_d|_d|_d|_dS)Nr)r& chunkSize chunkOffseterrors prevNumLines prevNumCols_bufferedCharacter)r"rrrrNszHTMLUnicodeInputStream.resetcCst|dr|}nt|}|S)zvProduces a file object from source. source can be either a file object, local filename or a string. r.)r>r)r"rBrrrrrLs z!HTMLUnicodeInputStream.openStreamcCsT|j}|jdd|}|j|}|jdd|}|dkr@|j|}n ||d}||fS)N rr r)r&countrSrfindrT)r"r(r&ZnLinesZ positionLineZ lastLinePosZpositionColumnrrr _positions   z HTMLUnicodeInputStream._positioncCs|j|j\}}|d|fS)z:Returns (line, col) of the current position in the stream.r )rYrQ)r"linecolrrrr!szHTMLUnicodeInputStream.positioncCs6|j|jkr|jstS|j}|j|}|d|_|S)zo Read one character from the stream or queue if available. Return EOF when EOF is reached. r )rQrP readChunkr r&)r"rQcharrrrr]s   zHTMLUnicodeInputStream.charNcCs|dkr|j}|j|j\|_|_d|_d|_d|_|jj|}|j rX|j |}d|_ n|s`dSt |dkrt |d }|dksd|kodknr|d |_ |dd}|j r|j ||j dd }|j d d }||_t ||_d S)NrOrFr iiz rV Trrr)_defaultChunkSizerYrPrSrTr&rQrMr.rUr$ordrGreplace)r"rPr2Zlastvrrrr\s0           z HTMLUnicodeInputStream.readChunkcCs,x&tttj|D]}|jjdqWdS)Nzinvalid-codepoint)ranger$invalid_unicode_refindallrRr1)r"r2_rrrrH%sz*HTMLUnicodeInputStream.characterErrorsUCS4cCsd}xtj|D]}|rqt|j}|j}tj|||drttj|||d}|tkrn|j j dd}q|dkr|dkr|t |dkr|j j dqd}|j j dqWdS)NFzinvalid-codepointTiir ) rdfinditerragroupstartrZisSurrogatePairZsurrogatePairToCodepointnon_bmp_invalid_codepointsrRr1r$)r"r2skipmatchZ codepointr%Zchar_valrrrrI)s   z*HTMLUnicodeInputStream.characterErrorsUCS2Fc Csyt||f}WnNtk r^djdd|D}|s@d|}tjd|}t||f<YnXg}x||j|j|j}|dkr|j|jkrPn0|j }||jkr|j |j|j|||_P|j |j|jd|j sfPqfWdj|}|S)z Returns a string of characters from the stream up to but not including any character in 'characters' or EOF. 'characters' must be a container that supports the 'in' method and iteration over its characters. rOcSsg|]}dt|qS)z\x%02x)ra)rcrrrrNsz5HTMLUnicodeInputStream.charsUntil..z^%sz[%s]+N) charsUntilRegExKeyErrorr4recompilermr&rQrPendr1r\) r"Z charactersZoppositecharsZregexr5mrsrrrr charsUntil@s.    z!HTMLUnicodeInputStream.charsUntilcCs@|dk r<|jdkr.||j|_|jd7_n|jd8_dS)Nrr )rQr&rP)r"r]rrrungetos   zHTMLUnicodeInputStream.unget)N)F)r6r7r8r9r`r#rNrLrYr!r]r\rHrIrwrxrrrrr@s   & /r@c@sLeZdZdZdddZddZd d Zdd d Zd dZddZ ddZ dS)rAzProvides a unicode stream of characters to the HTMLTokenizer. This class takes care of character encoding and removing or replacing incorrect byte-sequences and also provides column and line tracking. N windows-1252TcCs\|j||_tj||jd|_d|_||_||_||_||_ ||_ |j ||_ |j dS)aInitialises the HTMLInputStream. HTMLInputStream(source, [encoding]) -> Normalized stream from source for use by html5lib. source can be either a file-object, local filename or a string. The optional encoding parameter must be a string that indicates the encoding. If specified, that encoding will be used, regardless of any BOM or later declaration (such as in a meta element) idN)rL rawStreamr@r# numBytesMetanumBytesChardetoverride_encodingtransport_encodingsame_origin_parent_encodinglikely_encodingdefault_encodingdetermineEncodingrKrN)r"rBr~rrrrZ useChardetrrrr#s  zHTMLBinaryInputStream.__init__cCs&|jdjj|jd|_tj|dS)Nrrb)rKZ codec_info streamreaderr{rMr@rN)r"rrrrNszHTMLBinaryInputStream.resetc CsDt|dr|}nt|}y|j|jWnt|}YnX|S)zvProduces a file object from source. source can be either a file object, local filename or a string. r.)r>rr*r'r)r"rBrrrrrLs z HTMLBinaryInputStream.openStreamc Cs|jdf}|ddk r|St|jdf}|ddk r:|St|jdf}|ddk rX|S|jdf}|ddk rt|St|jdf}|ddk r|djjd r|St|jdf}|ddk r|S|rdyddl m }Wnt k rYnxXg}|}x6|j s.|j j|j}|sP|j||j|qW|jt|jd}|j jd|dk rd|dfSt|jdf}|ddk r|StddfS)NrErZ tentativezutf-16)UniversalDetectorencodingz windows-1252) detectBOMrJr~rdetectEncodingMetarname startswithrZchardet.universaldetectorr ImportErrordoner{r.r}r1Zfeedcloseresultr*r)r"ZchardetrKrZbuffersZdetectorr rrrrrsP           z'HTMLBinaryInputStream.determineEncodingcCst|}|dkrdS|jdkr(td}nT||jdkrH|jddf|_n4|jjd|df|_|jtd|jd|fdS)Nutf-16beutf-16lezutf-8rrEzEncoding changed from %s to %s)rr)rJrrKr{r*rNr)r"Z newEncodingrrrchangeEncodings   z$HTMLBinaryInputStream.changeEncodingc Cstjdtjdtjdtjdtjdi}|jjd}|j|dd}d}|sp|j|}d}|sp|j|dd }d }|r|jj |t |S|jj d dSdS) zAttempts to detect at BOM at the start of the stream. If an encoding can be determined from the BOM return the name of the encoding otherwise return Nonezutf-8zutf-16lezutf-16bezutf-32lezutf-32beNrgr) codecsBOM_UTF8 BOM_UTF16_LE BOM_UTF16_BE BOM_UTF32_LE BOM_UTF32_BEr{r.getr*rJ)r"ZbomDictstringrr*rrrrs"     zHTMLBinaryInputStream.detectBOMcCsH|jj|j}t|}|jjd|j}|dk rD|jdkrDtd}|S)z9Report the encoding declared by the meta element rNutf-16beutf-16lezutf-8)rr)r{r.r|EncodingParserr* getEncodingrrJ)r"r parserrrrrr9s z(HTMLBinaryInputStream.detectEncodingMeta)NNNNryT)T) r6r7r8r9r#rNrLrrrrrrrrrAs ( >"rAc@seZdZdZddZddZddZdd Zd d Zd d Z ddZ ddZ e e e Z ddZe eZefddZddZddZddZdS) EncodingByteszString-like object with an associated position and various extra methods If the position is ever greater than the string length then an exception is raisedcCstj||jS)N)r-__new__lower)r"valuerrrrLszEncodingBytes.__new__cCs d|_dS)Nr r)rY)r"rrrrr#PszEncodingBytes.__init__cCs|S)Nr)r"rrr__iter__TszEncodingBytes.__iter__cCs>|jd}|_|t|kr"tn |dkr.t|||dS)Nr r)rYr$ StopIterationr?)r"prrr__next__Ws  zEncodingBytes.__next__cCs|jS)N)r)r"rrrnext_szEncodingBytes.nextcCsB|j}|t|krtn |dkr$t|d|_}|||dS)Nrr )rYr$rr?)r"rrrrpreviouscs zEncodingBytes.previouscCs|jt|krt||_dS)N)rYr$r)r"r!rrr setPositionlszEncodingBytes.setPositioncCs*|jt|krt|jdkr"|jSdSdS)Nr)rYr$r)r"rrr getPositionqs  zEncodingBytes.getPositioncCs||j|jdS)Nr )r!)r"rrrgetCurrentByte{szEncodingBytes.getCurrentBytecCsL|j}x:|t|kr@|||d}||kr6||_|S|d7}qW||_dS)zSkip past a list of charactersr N)r!r$rY)r"rtrrnrrrrls zEncodingBytes.skipcCsL|j}x:|t|kr@|||d}||kr6||_|S|d7}qW||_dS)Nr )r!r$rY)r"rtrrnrrr skipUntils zEncodingBytes.skipUntilcCs>|j}|||t|}|j|}|r:|jt|7_|S)zLook for a sequence of bytes at the start of a string. If the bytes are found return True and advance the position to the byte after the match. Otherwise return False and leave the position alone)r!r$r)r"r-rr2r5rrr matchBytess  zEncodingBytes.matchBytescCsR||jdj|}|dkrJ|jdkr,d|_|j|t|d7_dStdS)zLook for the next sequence of bytes matching a given sequence. If a match is found advance the position to the last byte of the matchNr rTrr)r!findrYr$r)r"r-Z newPositionrrrjumpTos zEncodingBytes.jumpToN)r6r7r8r9rr#rrrrrrpropertyr!r currentBytespaceCharactersBytesrlrrrrrrrrHs      rc@sXeZdZdZddZddZddZdd Zd d Zd d Z ddZ ddZ ddZ dS)rz?Mini parser for detecting character encoding from meta elementscCst||_d|_dS)z3string - the data to work on for encoding detectionN)rr2r)r"r2rrrr#s zEncodingParser.__init__c Csd|jfd|jfd|jfd|jfd|jfd|jff}x^|jD]T}d}xD|D]<\}}|jj|rJy |}PWqJtk rd}PYqJXqJW|s)r2r)r"rrrrszEncodingParser.handleCommentcCs|jjtkrdSd}d}x|j}|dkr.dS|ddkr^|ddk}|r|dk r||_dSq|ddkr|d}t|}|dk r||_dSq|ddkrtt|d}|j}|dk rt|}|dk r|r||_dS|}qWdS) NTFrs http-equivr s content-typescharsetscontent) r2rr getAttributerrJContentAttrParserrparse)r"Z hasPragmaZpendingEncodingattrZtentativeEncodingcodecZ contentParserrrrrs:      zEncodingParser.handleMetacCs |jdS)NF)handlePossibleTag)r"rrrrsz%EncodingParser.handlePossibleStartTagcCst|j|jdS)NT)rr2r)r"rrrrs z#EncodingParser.handlePossibleEndTagcCsf|j}|jtkr(|r$|j|jdS|jt}|dkrD|jn|j}x|dk r`|j}qNWdS)NTr)r2rasciiLettersBytesrrrspacesAngleBracketsr)r"ZendTagr2rnrrrrrs     z EncodingParser.handlePossibleTagcCs |jjdS)Nr)r2r)r"rrrrszEncodingParser.handleOthercCs|j}|jttdgB}|dkr&dSg}g}xt|dkr@|r@PnX|tkrT|j}PnD|d krjdj|dfS|tkr|j|jn|dkrdS|j|t|}q0W|dkr|j dj|dfSt||j}|d kr:|}xt|}||krt|dj|dj|fS|tkr*|j|jq|j|qWnJ|dkrRdj|dfS|tkrl|j|jn|dkrzdS|j|x^t|}|t krdj|dj|fS|tkr|j|jn|dkrdS|j|qWdS) z_Return a name,value pair for the next attribute in the stream, if one is found, or None/rN=r3'")rN)rr)rr) r2rlr frozensetr4asciiUppercaseBytesr1rrrr)r"r2rnZattrNameZ attrValueZ quoteCharrrrrsf             zEncodingParser.getAttributeN) r6r7r8r9r#rrrrrrrrrrrrrs$rc@seZdZddZddZdS)rcCs ||_dS)N)r2)r"r2rrrr#fszContentAttrParser.__init__cCsy|jjd|jjd7_|jj|jjdks8dS|jjd7_|jj|jjdkr|jj}|jjd7_|jj}|jj|r|j||jjSdSnF|jj}y|jjt|j||jjStk r|j|dSXWntk rdSXdS)Nscharsetr rrr)rr)r2rr!rlrrrr)r"Z quoteMarkZ oldPositionrrrrjs.       zContentAttrParser.parseN)r6r7r8r#rrrrrresrcCs`t|tr.y|jd}Wntk r,dSX|dk rXy tj|Stk rTdSXndSdS)z{Return the python codec name corresponding to an encoding or None if the string doesn't correspond to a valid encoding.rN)r<rdecodeUnicodeDecodeErrorr lookupAttributeError)rrrrrJs  rJr)2Z __future__rrrZpip._vendor.sixrrZpip._vendor.six.movesrrrrqZ pip._vendorr Z constantsr r r rrrOriorrrrrrrrZinvalid_unicode_no_surrogaterFrrevalrdsetrkZascii_punctuation_reroobjectrrDr@rAr-rrrrJrrrrsV               JgIh6'