diff --git a/KNOWN_DEPLOYED_INSTANCES.md b/KNOWN_DEPLOYED_INSTANCES.md index 43fe1de3..26e477e5 100644 --- a/KNOWN_DEPLOYED_INSTANCES.md +++ b/KNOWN_DEPLOYED_INSTANCES.md @@ -4,6 +4,10 @@ * Production: (proxied to add https and domain): https://archive.org/services/context/wari/ * Development: None +### Statistics +All non-debug output from the endpoints are stored in a cache on the backend. +As of 2023-06-05 the cache contains 431,381 json files using 3.3 GB of space. + ### Endpoint example urls for the production endpoint The article/all/check-url/check-doi endpoints all return cached responses if you don’t add the refresh=true parameter. The references and reference endpoints always return cached data. diff --git a/README.md b/README.md index 51447f7f..238b65b9 100644 --- a/README.md +++ b/README.md @@ -1,54 +1,74 @@ # Internet Archive Reference Inventory [IARI](https://www.wikidata.org/wiki/Q117023013) -This tool is capable of fetching, extracting, transforming and storing -reference information from Wikipedia articles as [structured data](https://www.wikidata.org/wiki/Q26813700). - -IARI is currently an API with a few endpoints which hopefully makes it easy for others -to interact with. -On the longer term we are planning on populating a [Wikibase.cloud](https://wikibase.cloud/) instance -based on the data we extract. -We call this resulting database Wikipedia Citations Database (WCD). +This API is capable of fetching, extracting, transforming and storing +reference information from Wikipedia articles as [structured data](https://www.wikidata.org/wiki/Q26813700). -IARI has been developed by [James Hare](https://www.wikidata.org/wiki/Q23041486) -and [Dennis Priskorn](https://www.wikidata.org/wiki/Q111016131) as part of the -[Turn All References Blue project](https://www.wikidata.org/wiki/Q115136754) which is led by -Mark Graham, head of The -[Wayback Machine](https://www.wikidata.org/wiki/Q648266) department of the -[Internet Archive](https://www.wikidata.org/wiki/Q461). +The endpoints make it possible to get structured data about the references +from any Wikipedia article in any language version. +# Background There are [at least 200 million references in at least 40 million articles]( https://ieeexplore.ieee.org/abstract/document/9908858) and together with the article text in the Wikipedias one of the most valuable collections of knowledge ever made by humans, see [size comparison](https://en.wikipedia.org/wiki/Wikipedia:Size_comparisons). -The endpoint providing a detailed analysis of a Wikipedia article and its references -enable wikipedians to get an overview of the state of the references and -build tools that help curate and improve the references. +Wikimedia movement currently does not have good and effective tools +to help editors keep up the quality of references over time. +The references are stored in templates that differ between language versions of Wikipedia +which makes it hard for tool developers to develop good tools that work well +across different language versions. + +# Author +IARI has been developed by [Dennis Priskorn](https://www.wikidata.org/wiki/Q111016131) as part of the +[Turn All References Blue project](https://www.wikidata.org/wiki/Q115136754) which is led by +Mark Graham, head of The +[Wayback Machine](https://www.wikidata.org/wiki/Q648266) department of the +[Internet Archive](https://www.wikidata.org/wiki/Q461). + +# Goals +The endpoint providing a detailed analysis of a Wikipedia article and it's references +enable wikipedians to get an overview of the state of the references and using the API it is +possible for the Wikimedia tech-community to build tools that help make it easier to curate +and improve the references. This is part of a wider initiative help raise the quality of references in Wikipedia to enable everyone in the world to make decisions based on trustworthy knowledge that is derived from trustworthy sources. +# Stepping stone for a (graph) database of all references +This project is a part of the [Wikicite initiative](http://wikicite.org/). + +On the longer term Turn All References Blue project is planning on populating a database +based on the data we extract. +This part of the effort is led by [James Hare](https://www.wikidata.org/wiki/Q23041486). + +The end goal is a large database with all references from all Wikipedias. +We call it the Wikipedia Citations Database (WCD). + # Features + IARI features a number of endpoints that help patrons get structured data about references in a Wikipedia article: + * an _article_ endpoint which analyzes a given article and returns basic statistics about it * a _references_ endpoint which gives back all ids of references found * a _reference_ endpoint which gives back all details about a reference including templates and wikitext * a _check-url_ endpoint which looks up the URL and gives back -standardized information about its status + standardized information about its status * a _check-doi_ endpoint which looks up the DOI and gives back -standardized information about it from [FatCat](https://fatcat.wiki/), OpenAlex and Wikidata -including abstract, retracted status, and more. + standardized information about it from [FatCat](https://fatcat.wiki/), OpenAlex and Wikidata + including abstract, retracted status, and more. * a _pdf_ endpoint which extracts links both from annotations and free text from PDFs. * a _xhtml_ endpoint which extracts links both from any XHTML-page. # Limitations + See known limitations under each endpoint below. # Supported Wikipedias + Currently we support a handful of the 200+ language versions of Wikipedia but we plan on extending the support to all Wikipedia language versions and you can help us by submitting sections to search for references in issues and @@ -58,22 +78,25 @@ We also would like to support non-Wikimedia wikis using MediaWiki in the future and perhaps also any webpage on the internet with outlinks (e.g. news articles). ## Wikipedia templates + English Wikipedia for example has hundreds of special reference templates in use and a handful of widely used generic templates. WARI exposes them all when found in a reference. ## Reference types detected by the ArticleAnalyzer + We support detecting the following types. A reference cannot have multiple types. We distinguish between two main types of references: -1) **named reference** - type of reference with only a name and no content e.g. '' (Unsupported -beyond counting because we have not decided if they contain any value) + +1) **named reference** - type of reference with only a name and no content e.g. '' (Unsupported + beyond counting because we have not decided if they contain any value) 2) **content reference** - reference with any type of content - 1) **general reference** - subtype of content reference which is outside a and usually found in - sections called "Further reading" or "Bibliography" - (unsupported but we want to support it, see - https://github.com/internetarchive/wcdimportbot/labels/general%20reference) - ![image](https://user-images.githubusercontent.com/68460690/208092363-ba4b5346-cad7-495e-8aff-1aa4f2f0161e.png) - 2) **footnote reference** - subtype of content reference which is inside a (supported partially, see below) + 1) **general reference** - subtype of content reference which is outside a and usually found in + sections called "Further reading" or "Bibliography" + (unsupported but we want to support it, see + https://github.com/internetarchive/wcdimportbot/labels/general%20reference) + ![image](https://user-images.githubusercontent.com/68460690/208092363-ba4b5346-cad7-495e-8aff-1aa4f2f0161e.png) + 2) **footnote reference** - subtype of content reference which is inside a (supported partially, see below) Example of a URL-template reference: `Mueller Report, p12 {{url|http://example.com}} {{bare url inline}}` @@ -84,9 +107,13 @@ Example of an plain text reference: This is a footnote reference -> content reference -> Short citation reference aka naked named footnote. # Endpoints + ## Checking endpoints + ### Check URL + the check-url endpoint accepts the following parameters: + * url (mandatory) * refresh (optional) * testing (optional) @@ -95,23 +122,31 @@ the check-url endpoint accepts the following parameters: On error it returns 400. #### Known limitations -Sometimes we get back a 403 because an intermediary like Cloudflare detected that we are not a person behind a browser doing the request. We don't have any ways to detect these soft200s. -Also sometimes a server returns status code 200 but the content is an error page or any other content than what the person asking for the information wants. These are called soft404s and we currently do not make any effort to detect them. +Sometimes we get back a 403 because an intermediary like Cloudflare detected that we are not a person behind a browser +doing the request. We don't have any ways to detect these soft200s. + +Also sometimes a server returns status code 200 but the content is an error page or any other content than what the +person asking for the information wants. These are called soft404s and we currently do not make any effort to detect +them. You are very welcome to suggest improvements by opening an issue or sending a pull request. :) ## Statistics + ### article + the statistics/article endpoint accepts the following parameters: + * url (mandatory) * refresh (optional) * testing (optional) -On error it returns 400. On timeout it returns 504 or 502 +On error it returns 400. On timeout it returns 504 or 502 (this is a bug and should be reported). It will return json similar to: + ``` { "wari_id": "en.wikipedia.org.999263", @@ -192,15 +227,20 @@ It will return json similar to: } } ``` + #### Known limitations + * the general references parsing relies on 2 things: - * a manually supplied list of sections to search using the 'regex' to the article and all endpoints. The list is case insensitive and should be delimited by the '|' character. - * that every line with a general reference begins with a star character (*) + * a manually supplied list of sections to search using the 'regex' to the article and all endpoints. The list is + case insensitive and should be delimited by the '|' character. + * that every line with a general reference begins with a star character (*) Any line that doesn't begin with a star is ignored. ### references + the statistics/references endpoint accepts the following parameters: + * wari_id (mandatory, str) (this is obtained from the article endpoint) * all (optional, boolean) (default: false) * offset (optional, int) (default: 0) @@ -209,6 +249,7 @@ the statistics/references endpoint accepts the following parameters: On error it returns 400. If data is not found it returns 404. It will return json similar to: + ``` { "total": 31, @@ -229,16 +270,21 @@ It will return json similar to: ] } ``` + #### Known limitations + None ### reference + the statistics/reference/id endpoint accepts the following parameters: + * id (mandatory) (this is unique for each reference and is obtained from the article or references endpoint) On error it returns 400. If data is not found it returns 404. It will return json similar to: + ``` { "id": "cfa8b438", @@ -254,184 +300,223 @@ It will return json similar to: "served_from_cache": true } ``` + #### Known limitations + None ### PDF + the statistics/pdf endpoint accepts the following parameters: + * url (mandatory) * refresh (bool, optional) * testing (bool, optional) * timeout (int, optional) -* debug (bool, optional) +* debug (bool, optional) Note: you have to set refresh=true to be sure to get debug output -On error it returns 404 or 415. The first is when we could not find/fetch the url +On error it returns 404 or 415. The first is when we could not find/fetch the url and the second is when it is not a valid PDF. -The `urls_fixed` object has an array of fixed url fragments in case any were fixed. See [this output](https://archive.org/services/context/wari/v2/statistics/pdf?url=https://s3.documentcloud.org/documents/23782225/mwg-fdr-document-04-16-23-1.pdf&refresh=true). - If not given debug=true it will return json similar to: + ``` { - "words_mean": 306, - "words_max": 462, - "words_min": 0, - "annotation_links": [], - "text_links": [ - { - "url": "https://www.rfc-editor.org/info/rfc791.", - "page": 168 - }, - { - "url": "https://www.ccbe.eu/fileadmin/speciality_distribution/public/documents/SURVEILLANCE/SVL_Guides_recommendations/EN_SVL_20190329_CCBE-Recommendations-on-the-protection-of-fundamental-rights-in-the-context-of-national-security.pdf.", - "page": 182 - }, - { - "url": "https://www.undom.se", - "page": 204 - }, - { - "url": "https://bra.se/statistik/kriminalstatistik/anmalda-brott/om-statistiken.html.", - "page": 255 - }, - { - "url": "https://bra.se/statistik/publiceringsplan.html.", - "page": 259 - }, - { - "url": "https://www.ft.dk/ripdf/samling/20211/lovforslag/l93/20211_l93_som_fremsat.pdf.", - "page": 288 - }, - { - "url": "https://it-ord.idg.se/.", - "page": 325 - }, + "words_mean": 213, + "words_max": 842, + "words_min": 41, + "annotation_links": [ { - "url": "http://www.ne.se/uppslagsverk/encyklopedi/l\u00e5ng/ott.", - "page": 325 + "url": "https://web.archive.org/web/20210501230502/cisa.gov/mdm", + "page": 0 }, { - "url": "https://svenskarnaochinternet.se/rapporter/svenskarna-och-internet-2021.", - "page": 329 + "url": "https://web.archive.org/web/20210501230502/cisa.gov/mdm", + "page": 0 }, { - "url": "https://www.rfc-editor.org/pdfrfc/rfc791.txt.pdf", - "page": 330 + "url": "https://web.archive.org/web/20210501230502/cisa.gov/mdm", + "page": 0 }, { - "url": "https://www.rfc-editor.org/pdfrfc/rfc1883.txt.pdf.", - "page": 330 + "url": "https://web.archive.org/web/20210501230502/cisa.gov/mdm", + "page": 0 }, { - "url": "https://www.ripe.net/publications/news/about-ripe-ncc-and-ripe/the-ripe-ncc-has-run-out-of-ipv4-addresses.", - "page": 330 + "url": "https://judiciary.house.gov/media/press-releases/chairman-jim-jordan-subpoenas-big-tech-executives", + "page": 0 }, { - "url": "https://itc.ktu.lt/index.php/ITC/article/view/14451.", - "page": 332 + "url": "https://rumble.com/v1gx8h7-dhss-foreign-to-domestic-disinformation-switcheroo.html", + "page": 1 }, { - "url": "https://www.dn.se/debatt/eus-nya-massovervakning-far-inte-forstora-kallskyddet/,", - "page": 339 + "url": "https://web.archive.org/web/20230224163731/cisa.gov/mdm", + "page": 3 }, { - "url": "https://www.svd.se/a/oneeLR/eu-forslaget-innebar-en-orimlig-overvakning-skriver-paarup-petersen,", - "page": 339 + "url": "https://www.cisa.gov/topics/election-security/foreign-influence-operations-and-disinformation", + "page": 3 }, { - "url": "https://www.svd.se/a/zEkO4r/helene-fritzon-s-sexuella-overgrepp-mot-barn-maste-upptackas,", - "page": 339 - }, - { - "url": "https://www.svt.se/nyheter/utrikes/eu-forslaget-chat-control-kritiseras.", - "page": 339 - }, - { - "url": "https://statistik.pts.se/svensk-telekommarknad/.", - "page": 342 - }, - { - "url": "https://polisen.se/siteassets/dokument/organiserad_brottslighet/rapport_org_brottslighet_2019_webb_200326.pdf.", - "page": 344 - }, - { - "url": "http://www.ne.se/uppslagsverk/encyklopedi/l\u00e5ng/rundradio.", - "page": 402 - }, - { - "url": "https://www.coe.int/en/web/conventions/full-list/-/conventions/treaty/185/signatures.", - "page": 439 - }, - { - "url": "https://rm.coe.int/16806f943e.", - "page": 441 - }, - { - "url": "https://rm.coe.int/16806a495e.", - "page": 443 - }, - { - "url": "https://rm.coe.int/terms-of-reference-for-the-preparation-of-a-draft-2nd-additional-proto/168072362b.", - "page": 444 - }, - { - "url": "https://rm.coe.int/1680a49dab.", - "page": 444 - }, - { - "url": "https://rm.coe.int/0900001680a49f74/.", - "page": 445 - }, - { - "url": "https://data.consilium.europa.eu/doc/document/ST-15072-2016-REV-1/en/pdf.", - "page": 446 - }, - { - "url": "https://www.congress.gov/bill/115th-congress/house-bill/4943/text.", - "page": 451 - }, + "url": "https://report.foundationforfreedomonline.com/8-29-22.html", + "page": 3 + } + ], + "links_from_original_text": [ { - "url": "https://rm.coe.int/16806f943e.", - "page": 465 - }, + "url": "https://www.cisa.gov/topics/election-security/foreign-influence-operations-and-", + "page": 3 + } + ], + "links_from_text_without_linebreaks": [ { - "url": "https://www.statista.com/statistics/272014/global-social-networks-ranked-by-number-of-users/.", - "page": 488 - }, + "url": "https://www.cisa.gov/topics/election-security/foreign-influence-operations-and-disinformationAll", + "page": 3 + } + ], + "links_from_text_without_spaces": [ { - "url": "https://jamstalldhetsmyndigheten.se/mans-vald-mot-kvinnor/fakta-och-statistik/.", - "page": 496 + "url": "https://www.cisa.gov/topics/election-security/foreign-influence-operations-and-", + "page": 3 } ], - "text_links_total": 31, - "annotation_links_total": 0, - "url": "https://www.regeringen.se/contentassets/35276188514d40fb87785c81c0fbda93/datalagring-och-atkomst-till-elektronisk-information-sou-202322/", - "timeout": 10, - "urls_fixed": [], - "pages_total": 624, - "detected_language": "sv", + "url": "https://www.foundationforfreedomonline.com/wp-content/uploads/2023/03/FFO-FLASH-REPORT-REV.pdf", + "timeout": 2, + "pages_total": 6, + "detected_language": "en", "detected_language_error": false, "detected_language_error_details": "", - "timestamp": 1685740492, - "isodate": "2023-06-02T23:14:52.566974", - "id": "b7f11051" + "debug_text_original": { + "1": "In this first screenshot, the MDM page describes how DHS used to only be involved in\ncensorship work against foreign-based social media opinions. Then, the Countering\nForeign Influence Task Force changed its name to generic \u201cMis, Dis and\nMalinformation,\u201d which included domestic-based social media opinions.\nfoundationforfreedomonline.org\n2\nF L A S H R E P O R T\nFOUNDATION FOR\nFREEDOM ONLINE\nFFO has previously covered the Foreign-To-Domestic Censorship Switcheroo\ndescribed in this video found here.\nThe CISA site plainly stated it believed it could take action to neutralize domestic\nspeech online by classifying purveyors of domestic misinformation as \u201cdomestic threat\nactors\u201d on par with someone conducting a traditional cyber-attack.\n", + "2": "foundationforfreedomonline.org\n3\nThe former CISA site went on to proudly tout its role in coordinating the private sector\ncensorship of domestic citizens\u2019 Covid-19 narratives as well:\nF L A S H R E P O R T\nFOUNDATION FOR\nFREEDOM ONLINE\n", + "3": "foundationforfreedomonline.org\n4\nBut sometime last week, between Friday, Feb. 24 at 4:37 p.m. and Sunday, Feb. 26\nat 5:55 a.m., CISA\u2019s once loud-and-proud declaration of long-arm jurisdiction over\ndomestic opinions online seems to have been walked back.\nThe site page for cisa.gov/mdm now redirects to a generic, foreign-only focused\ncounter-disinfo page:\nhttps://www.cisa.gov/topics/election-security/foreign-influence-operations-and-\ndisinformation\nAll references to the word or concept of \u201cdomestic\u201d inward-facing role of CISA have\nbeen carefully scrubbed:\nF L A S H R E P O R T\nFOUNDATION FOR\nFREEDOM ONLINE\nFFO extensively covered CISA\u2019s domestic censorship of Covid-19 in this report.\nThis is how an obscure cybersecurity subagency tucked within DHS justified making\ncensorship instructional videos like the one pictured below.\n", + "4": "foundationforfreedomonline.org\n5\nThe references to CISA\u2019s censorship of Covid and 2020 election claims have\ndisappeared as well.\nPerhaps CISA hopes to reverse what is now several years of outright government\ncensorship of domestic speech of American citizens. Or perhaps they are simply\nhoping no one will notice, or people will forget.\nF L A S H R E P O R T\nFOUNDATION FOR\nFREEDOM ONLINE\nYou can see here the term \u201cdomestic threat actors\u201d has disappeared altogether:\n", + "5": "The public-private domestic censorship operation coordinated by the federal\ngovernment has quietly been organized to quell the online opinions of everyday\nAmericans. Although DHS began to tout their coordination of such efforts publicly on\ntheir website, groups like Foundation for Freedom Online have exposed the\nbackbone of this taxpayer-funded domestic censorship apparatus. As a result, it is\nno surprise that DHS appears to be backtracking on the public display of their\ndomestic censorship efforts. \nCONCLUSION\nF L A S H R E P O R T\nFOUNDATION FOR\nFREEDOM ONLINE\nfoundationforfreedomonline.org\n6\n" + }, + "debug_text_without_linebreaks": { + "0": "DHS Quietly Purges CISA \"Mis, Dis andMalinformation\" Website To RemoveDomestic Censorship ReferencesSince May 1, 2021, CISA.gov/mdm had stood with an open public declaration that itclassified domestic opinions deemed domestic \u201cmisinformation\u201d as an attack on\u201cdemocratic institutions,\u201d and therefore as a category of cyber threat to be neutralizedby DHS\u2019s cyber division, the Cybersecurity and Infrastructure Security Agency (CISA).Provided below are highlighted screenshots of CISA.gov/mdm snapped by theWayback Machine on May 1, 2021. foundationforfreedomonline.org1F L A S H R E P O R TFOUNDATION FORFREEDOM ONLINEKEY TAKEAWAYST H E D E P A R T M E N T O F H O M E L A N DS E C U R I T Y \u2019 S( D H S )P R I M A R Y C E N S O R S H I P C O O R D I N A T I N GA G E N C YH A SQ U I E T L Y P U R G E D W H A T F O R T W O Y E A R S H A D S T O O D A S AP U B L I C C O N F E S S I O N O F T A R G E T I N G U SC I T I Z E N S\u2013\u201c D O M E S T I C T H R E A T A C T O R S \u201d \u2013 W H O P O S T \u201c M I S , D I S O RM A L I N F O R M A T I O N \u201d ( M D M ) O N S O C I A L M E D I A A B O U T C O V I D -1 9 , U S E L E C T I O N I S S U E S , A N D O T H E R C O N T R O V E R S I A LT O P I C S .A F O U N D A T I O N F O R F R E E D O M O N L I N E I N V E S T I G A T I O NO F W A Y B A C K M A C H I N EA R C H I V E S H A S D E T E R M I N E DT H A T L A T E L A S T W E E K , D H S S C R U B B E D A N D R E -D I R E C T E D A L O N G S T A N D I N G W E B S I T E L I N K T H A T W A SH O M E T O T H E D H S C E N S O R S H I P T E A M T H A TC O O R D I N A T E S P R I V A T ES E C T O R \u201c C O U N T E R - D I S I N F O \u201dF I R M S T O M A S S - F L A G S O C I A L M E D I A A C C O U N T S U S I N GD H S \u2019 S \u201c D O M E S T I C D I S I N F O R M A T I O N S W I T C H B O A R D . \u201dT H E S C R U B B I N G C O M E SA G A I N S T T H E B A C K D R O P O FM O U N T I N G P U B L I CA W A R E N E S SA N DP R O A C T I V EC O N G R E S S I O N A L I N Q U I R Y A N D S U B P O E N A S I N T O T H EF E D E R A L G O V E R N M E N T \u2019 S R O L E I N D O M E S T I CC E N S O R S H I P1.2.3.FINDINGS", + "1": "In this first screenshot, the MDM page describes how DHS used to only be involved incensorship work against foreign-based social media opinions. Then, the CounteringForeign Influence Task Force changed its name to generic \u201cMis, Dis andMalinformation,\u201d which included domestic-based social media opinions.foundationforfreedomonline.org2F L A S H R E P O R TFOUNDATION FORFREEDOM ONLINEFFO has previously covered the Foreign-To-Domestic Censorship Switcheroodescribed in this video found here.The CISA site plainly stated it believed it could take action to neutralize domesticspeech online by classifying purveyors of domestic misinformation as \u201cdomestic threatactors\u201d on par with someone conducting a traditional cyber-attack.", + "2": "foundationforfreedomonline.org3The former CISA site went on to proudly tout its role in coordinating the private sectorcensorship of domestic citizens\u2019 Covid-19 narratives as well:F L A S H R E P O R TFOUNDATION FORFREEDOM ONLINE", + "3": "foundationforfreedomonline.org4But sometime last week, between Friday, Feb. 24 at 4:37 p.m. and Sunday, Feb. 26at 5:55 a.m., CISA\u2019s once loud-and-proud declaration of long-arm jurisdiction overdomestic opinions online seems to have been walked back.The site page for cisa.gov/mdm now redirects to a generic, foreign-only focusedcounter-disinfo page:https://www.cisa.gov/topics/election-security/foreign-influence-operations-and-disinformationAll references to the word or concept of \u201cdomestic\u201d inward-facing role of CISA havebeen carefully scrubbed:F L A S H R E P O R TFOUNDATION FORFREEDOM ONLINEFFO extensively covered CISA\u2019s domestic censorship of Covid-19 in this report.This is how an obscure cybersecurity subagency tucked within DHS justified makingcensorship instructional videos like the one pictured below.", + "4": "foundationforfreedomonline.org5The references to CISA\u2019s censorship of Covid and 2020 election claims havedisappeared as well.Perhaps CISA hopes to reverse what is now several years of outright governmentcensorship of domestic speech of American citizens. Or perhaps they are simplyhoping no one will notice, or people will forget.F L A S H R E P O R TFOUNDATION FORFREEDOM ONLINEYou can see here the term \u201cdomestic threat actors\u201d has disappeared altogether:", + "5": "The public-private domestic censorship operation coordinated by the federalgovernment has quietly been organized to quell the online opinions of everydayAmericans. Although DHS began to tout their coordination of such efforts publicly ontheir website, groups like Foundation for Freedom Online have exposed thebackbone of this taxpayer-funded domestic censorship apparatus. As a result, it isno surprise that DHS appears to be backtracking on the public display of theirdomestic censorship efforts. CONCLUSIONF L A S H R E P O R TFOUNDATION FORFREEDOM ONLINEfoundationforfreedomonline.org6" + }, + "debug_text_without_spaces": { + "2": "foundationforfreedomonline.org\n3\nTheformerCISAsitewentontoproudlytoutitsroleincoordinatingtheprivatesector\ncensorshipofdomesticcitizens\u2019Covid-19narrativesaswell:\nFLASHREPORT\nFOUNDATIONFOR\nFREEDOMONLINE\n", + "3": "foundationforfreedomonline.org\n4\nButsometimelastweek,betweenFriday,Feb.24at4:37p.m.andSunday,Feb.26\nat5:55a.m.,CISA\u2019sonceloud-and-prouddeclarationoflong-armjurisdictionover\ndomesticopinionsonlineseemstohavebeenwalkedback.\nThesitepageforcisa.gov/mdmnowredirectstoageneric,foreign-onlyfocused\ncounter-disinfopage:\nhttps://www.cisa.gov/topics/election-security/foreign-influence-operations-and-\ndisinformation\nAllreferencestothewordorconceptof\u201cdomestic\u201dinward-facingroleofCISAhave\nbeencarefullyscrubbed:\nFLASHREPORT\nFOUNDATIONFOR\nFREEDOMONLINE\nFFOextensivelycoveredCISA\u2019sdomesticcensorshipofCovid-19inthisreport.\nThisishowanobscurecybersecuritysubagencytuckedwithinDHSjustifiedmaking\ncensorshipinstructionalvideosliketheonepicturedbelow.\n", + "4": "foundationforfreedomonline.org\n5\nThereferencestoCISA\u2019scensorshipofCovidand2020electionclaimshave\ndisappearedaswell.\nPerhapsCISAhopestoreversewhatisnowseveralyearsofoutrightgovernment\ncensorshipofdomesticspeechofAmericancitizens.Orperhapstheyaresimply\nhopingnoonewillnotice,orpeoplewillforget.\nFLASHREPORT\nFOUNDATIONFOR\nFREEDOMONLINE\nYoucanseeheretheterm\u201cdomesticthreatactors\u201dhasdisappearedaltogether:\n", + "5": "Thepublic-privatedomesticcensorshipoperationcoordinatedbythefederal\ngovernmenthasquietlybeenorganizedtoquelltheonlineopinionsofeveryday\nAmericans.AlthoughDHSbegantotouttheircoordinationofsucheffortspubliclyon\ntheirwebsite,groupslikeFoundationforFreedomOnlinehaveexposedthe\nbackboneofthistaxpayer-fundeddomesticcensorshipapparatus.Asaresult,itis\nnosurprisethatDHSappearstobebacktrackingonthepublicdisplayoftheir\ndomesticcensorshipefforts.\nCONCLUSION\nFLASHREPORT\nFOUNDATIONFOR\nFREEDOMONLINE\nfoundationforfreedomonline.org\n6\n" + }, + "debug_url_annotations": { + "0": [ + { + "kind": 2, + "xref": 130, + "from": "Rect(112.01813507080078, 691.85791015625, 117.26372528076172, 706.8453369140625)", + "uri": "https://web.archive.org/web/20210501230502/cisa.gov/mdm", + "id": "" + }, + { + "kind": 2, + "xref": 131, + "from": "Rect(117.26372528076172, 691.85791015625, 192.20074462890625, 706.8453369140625)", + "uri": "https://web.archive.org/web/20210501230502/cisa.gov/mdm", + "id": "" + }, + { + "kind": 2, + "xref": 132, + "from": "Rect(347.7430419921875, 264.389404296875, 438.4168395996094, 277.8780517578125)", + "uri": "https://web.archive.org/web/20210501230502/cisa.gov/mdm", + "id": "" + }, + { + "kind": 2, + "xref": 133, + "from": "Rect(119.93448638916016, 294.3642578125, 361.231689453125, 307.8529052734375)", + "uri": "https://web.archive.org/web/20210501230502/cisa.gov/mdm", + "id": "" + }, + { + "kind": 2, + "xref": 134, + "from": "Rect(119.93448638916016, 579.2968139648438, 494.6195983886719, 594.2842407226562)", + "uri": "https://judiciary.house.gov/media/press-releases/chairman-jim-jordan-subpoenas-big-tech-executives", + "id": "" + } + ], + "1": [ + { + "kind": 2, + "xref": 51, + "from": "Rect(249.50668334960938, 665.026123046875, 281.7295837402344, 680.0134887695312)", + "uri": "https://rumble.com/v1gx8h7-dhss-foreign-to-domestic-disinformation-switcheroo.html", + "id": "" + } + ], + "3": [ + { + "kind": 2, + "xref": 53, + "from": "Rect(280.970703125, 459.9183654785156, 442.0852966308594, 474.90576171875)", + "uri": "https://web.archive.org/web/20230224163731/cisa.gov/mdm", + "id": "" + }, + { + "kind": 2, + "xref": 54, + "from": "Rect(75.64327239990234, 575.3213500976562, 543.9996337890625, 606.794921875)", + "uri": "https://www.cisa.gov/topics/election-security/foreign-influence-operations-and-disinformation", + "id": "" + }, + { + "kind": 2, + "xref": 55, + "from": "Rect(460.0701904296875, 78.75225830078125, 535.756591796875, 93.73968505859375)", + "uri": "https://report.foundationforfreedomonline.com/8-29-22.html", + "id": "" + } + ] + }, + "characters": 5161, + "timestamp": 1685903956, + "isodate": "2023-06-04T20:39:16.906306", + "id": "a07f3f88", + "refreshed_now": true } ``` -Using the debug parameter, all the text before and after cleaning is exposed as well as the link-annotations before url extraction. -This output permits the data consumer to count number of links per page, which links or domains appear most, etc. +Using the debug and refresh parameter, all the text before and after cleaning is exposed as well as the link-annotations +before url extraction. + +Note: Setting debug=true parameter without refresh=true will often not yield any debug output since we don't have it +stored in the cache. + +This output permits the data consumer to count number of links per page, which links or domains appear most, number of +characters in the pdf, etc. #### Known limitations -The extraction of URLs from unstructured text in any random PDF is a difficult thing to do reliably. This is because PDF is not a structured data format with clear boundaries between different type of information. -We currently use PyMuPDF to extract all the text as it appears on the page, remove all linebreaks and a use a regex to extract anything looking like a URL. This approach results in some links containing information that was part of the text after. +The extraction of URLs from unstructured text in any random PDF is a +difficult thing to do reliably. This is because +PDF is not a structured data format with clear boundaries between different type of information. + +We currently use PyMuPDF to extract all the text as it appears on the page. +We clean the text in various ways to get the links out via a regex. +This approach results in some links containing information that was part of the +text after the link ended but we have yet to find a way to reliably determine +the end boundary of links. -We are currently investigating if using Machine Learning could improve the output. +We are currently investigating if using the html output of PyMuPdf or +using advanced Machine Learning could improve the output. You are very welcome to suggest improvements by opening an issue or sending a pull request. :) ### XHTML + the statistics/pdf endpoint accepts the following parameters: + * url (mandatory) * refresh (optional) * testing (optional) @@ -439,6 +524,7 @@ the statistics/pdf endpoint accepts the following parameters: On error it returns 400. It will return json similar to: + ``` { "links": [ @@ -462,10 +548,13 @@ It will return json similar to: "refreshed_now": false } ``` + #### Known limitations + None # Installation + Clone the git repo: `$ git clone https://github.com/internetarchive/iari.git && cd iari` @@ -473,20 +562,23 @@ Clone the git repo: We recommend checking out the latest release before proceeding. ## Requirements + * python pip * python gunicorn -* python poetry - +* python poetry + ## Setup + We use pip and poetry to set everything up. `$ pip install poetry gunicorn && poetry install` Lastly setup the directories for the json cache files -`$ ./setup_json_directories.sh` - +`$ ./setup_json_directories.sh` + ## Run + Run these commands in different shells or in GNU screen. Start GNU screen (if you want to have a persisting session) @@ -494,6 +586,7 @@ Start GNU screen (if you want to have a persisting session) `$ screen -D -RR` ### Development mode + Run it with `$ ./run-debug-api.sh` @@ -501,6 +594,7 @@ Test it in another Screen window or local terminal with `$ curl -i "localhost:5000/v2/statistics/article?regex=external%20links&url=https://en.wikipedia.org/wiki/Test"` ### Production mode + Run it with `$ ./run-api.sh` @@ -508,24 +602,31 @@ Test it in another Screen window or local terminal with `$ curl -i "localhost:8000/v2/statistics/article?regex=external%20links&url=https://en.wikipedia.org/wiki/Test"` # Deployed instances + See [KNOWN_DEPLOYED_INSTANCES.md](KNOWN_DEPLOYED_INSTANCES.md) # Diagrams + ## IARI + ### Components + ![image](diagrams/components.png) # History of this repo + * version 1.0.0 a proof of concept import tool based on WikidataIntegrator (by James Hare) * version 2.0.0+ a scalable ETL-framework with an API and capability of reading EventStreams (by Dennis Priskorn) * version 3.0.0+ WARI, a host of API endpoints that returns statistics -about a Wikipedia article and its references. (by Dennis Priskorn) + about a Wikipedia article and its references. (by Dennis Priskorn) * version 4.0.0+ IARI, a host of API endpoints that returns statistics -about both Wikipedia articles and its references and can extract links from any PDF or XHTML page. (by Dennis Priskor + about both Wikipedia articles and its references and can extract links from any PDF or XHTML page. (by Dennis Priskor # License + This project is licensed under GPLv3+. Copyright Dennis Priskorn 2022 The diagram PNG files are CC0. # Further reading and installation/setup + See the [development notes](DEVELOPMENT_NOTES.md) diff --git a/poetry.lock b/poetry.lock index 90009d22..67df0929 100644 --- a/poetry.lock +++ b/poetry.lock @@ -618,14 +618,14 @@ dotenv = ["python-dotenv"] [[package]] name = "flask-restful" -version = "0.3.9" +version = "0.3.10" description = "Simple framework for creating REST APIs" category = "main" optional = false python-versions = "*" files = [ - {file = "Flask-RESTful-0.3.9.tar.gz", hash = "sha256:ccec650b835d48192138c85329ae03735e6ced58e9b2d9c2146d6c84c06fa53e"}, - {file = "Flask_RESTful-0.3.9-py2.py3-none-any.whl", hash = "sha256:4970c49b6488e46c520b325f54833374dc2b98e211f1b272bd4b0c516232afe2"}, + {file = "Flask-RESTful-0.3.10.tar.gz", hash = "sha256:fe4af2ef0027df8f9b4f797aba20c5566801b6ade995ac63b588abf1a59cec37"}, + {file = "Flask_RESTful-0.3.10-py2.py3-none-any.whl", hash = "sha256:1cf93c535172f112e080b0d4503a8d15f93a48c88bdd36dd87269bdaf405051b"}, ] [package.dependencies] @@ -941,62 +941,62 @@ source = ["Cython (>=0.29.7)"] [[package]] name = "markupsafe" -version = "2.1.2" +version = "2.1.3" description = "Safely add untrusted strings to HTML/XML markup." category = "main" optional = false python-versions = ">=3.7" files = [ - {file = "MarkupSafe-2.1.2-cp310-cp310-macosx_10_9_universal2.whl", hash = "sha256:665a36ae6f8f20a4676b53224e33d456a6f5a72657d9c83c2aa00765072f31f7"}, - {file = "MarkupSafe-2.1.2-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:340bea174e9761308703ae988e982005aedf427de816d1afe98147668cc03036"}, - {file = "MarkupSafe-2.1.2-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:22152d00bf4a9c7c83960521fc558f55a1adbc0631fbb00a9471e097b19d72e1"}, - {file = "MarkupSafe-2.1.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:28057e985dace2f478e042eaa15606c7efccb700797660629da387eb289b9323"}, - {file = "MarkupSafe-2.1.2-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:ca244fa73f50a800cf8c3ebf7fd93149ec37f5cb9596aa8873ae2c1d23498601"}, - {file = "MarkupSafe-2.1.2-cp310-cp310-musllinux_1_1_aarch64.whl", hash = "sha256:d9d971ec1e79906046aa3ca266de79eac42f1dbf3612a05dc9368125952bd1a1"}, - {file = "MarkupSafe-2.1.2-cp310-cp310-musllinux_1_1_i686.whl", hash = "sha256:7e007132af78ea9df29495dbf7b5824cb71648d7133cf7848a2a5dd00d36f9ff"}, - {file = "MarkupSafe-2.1.2-cp310-cp310-musllinux_1_1_x86_64.whl", hash = "sha256:7313ce6a199651c4ed9d7e4cfb4aa56fe923b1adf9af3b420ee14e6d9a73df65"}, - {file = "MarkupSafe-2.1.2-cp310-cp310-win32.whl", hash = "sha256:c4a549890a45f57f1ebf99c067a4ad0cb423a05544accaf2b065246827ed9603"}, - {file = "MarkupSafe-2.1.2-cp310-cp310-win_amd64.whl", hash = "sha256:835fb5e38fd89328e9c81067fd642b3593c33e1e17e2fdbf77f5676abb14a156"}, - {file = "MarkupSafe-2.1.2-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:2ec4f2d48ae59bbb9d1f9d7efb9236ab81429a764dedca114f5fdabbc3788013"}, - {file = "MarkupSafe-2.1.2-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:608e7073dfa9e38a85d38474c082d4281f4ce276ac0010224eaba11e929dd53a"}, - {file = "MarkupSafe-2.1.2-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:65608c35bfb8a76763f37036547f7adfd09270fbdbf96608be2bead319728fcd"}, - {file = "MarkupSafe-2.1.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:f2bfb563d0211ce16b63c7cb9395d2c682a23187f54c3d79bfec33e6705473c6"}, - {file = "MarkupSafe-2.1.2-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:da25303d91526aac3672ee6d49a2f3db2d9502a4a60b55519feb1a4c7714e07d"}, - {file = "MarkupSafe-2.1.2-cp311-cp311-musllinux_1_1_aarch64.whl", hash = "sha256:9cad97ab29dfc3f0249b483412c85c8ef4766d96cdf9dcf5a1e3caa3f3661cf1"}, - {file = "MarkupSafe-2.1.2-cp311-cp311-musllinux_1_1_i686.whl", hash = "sha256:085fd3201e7b12809f9e6e9bc1e5c96a368c8523fad5afb02afe3c051ae4afcc"}, - {file = "MarkupSafe-2.1.2-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:1bea30e9bf331f3fef67e0a3877b2288593c98a21ccb2cf29b74c581a4eb3af0"}, - {file = "MarkupSafe-2.1.2-cp311-cp311-win32.whl", hash = "sha256:7df70907e00c970c60b9ef2938d894a9381f38e6b9db73c5be35e59d92e06625"}, - {file = "MarkupSafe-2.1.2-cp311-cp311-win_amd64.whl", hash = "sha256:e55e40ff0cc8cc5c07996915ad367fa47da6b3fc091fdadca7f5403239c5fec3"}, - {file = "MarkupSafe-2.1.2-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:a6e40afa7f45939ca356f348c8e23048e02cb109ced1eb8420961b2f40fb373a"}, - {file = "MarkupSafe-2.1.2-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:cf877ab4ed6e302ec1d04952ca358b381a882fbd9d1b07cccbfd61783561f98a"}, - {file = "MarkupSafe-2.1.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:63ba06c9941e46fa389d389644e2d8225e0e3e5ebcc4ff1ea8506dce646f8c8a"}, - {file = "MarkupSafe-2.1.2-cp37-cp37m-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:f1cd098434e83e656abf198f103a8207a8187c0fc110306691a2e94a78d0abb2"}, - {file = "MarkupSafe-2.1.2-cp37-cp37m-musllinux_1_1_aarch64.whl", hash = "sha256:55f44b440d491028addb3b88f72207d71eeebfb7b5dbf0643f7c023ae1fba619"}, - {file = "MarkupSafe-2.1.2-cp37-cp37m-musllinux_1_1_i686.whl", hash = "sha256:a6f2fcca746e8d5910e18782f976489939d54a91f9411c32051b4aab2bd7c513"}, - {file = "MarkupSafe-2.1.2-cp37-cp37m-musllinux_1_1_x86_64.whl", hash = "sha256:0b462104ba25f1ac006fdab8b6a01ebbfbce9ed37fd37fd4acd70c67c973e460"}, - {file = "MarkupSafe-2.1.2-cp37-cp37m-win32.whl", hash = "sha256:7668b52e102d0ed87cb082380a7e2e1e78737ddecdde129acadb0eccc5423859"}, - {file = "MarkupSafe-2.1.2-cp37-cp37m-win_amd64.whl", hash = "sha256:6d6607f98fcf17e534162f0709aaad3ab7a96032723d8ac8750ffe17ae5a0666"}, - {file = "MarkupSafe-2.1.2-cp38-cp38-macosx_10_9_universal2.whl", hash = "sha256:a806db027852538d2ad7555b203300173dd1b77ba116de92da9afbc3a3be3eed"}, - {file = "MarkupSafe-2.1.2-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:a4abaec6ca3ad8660690236d11bfe28dfd707778e2442b45addd2f086d6ef094"}, - {file = "MarkupSafe-2.1.2-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:f03a532d7dee1bed20bc4884194a16160a2de9ffc6354b3878ec9682bb623c54"}, - {file = "MarkupSafe-2.1.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:4cf06cdc1dda95223e9d2d3c58d3b178aa5dacb35ee7e3bbac10e4e1faacb419"}, - {file = "MarkupSafe-2.1.2-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:22731d79ed2eb25059ae3df1dfc9cb1546691cc41f4e3130fe6bfbc3ecbbecfa"}, - {file = "MarkupSafe-2.1.2-cp38-cp38-musllinux_1_1_aarch64.whl", hash = "sha256:f8ffb705ffcf5ddd0e80b65ddf7bed7ee4f5a441ea7d3419e861a12eaf41af58"}, - {file = "MarkupSafe-2.1.2-cp38-cp38-musllinux_1_1_i686.whl", hash = "sha256:8db032bf0ce9022a8e41a22598eefc802314e81b879ae093f36ce9ddf39ab1ba"}, - {file = "MarkupSafe-2.1.2-cp38-cp38-musllinux_1_1_x86_64.whl", hash = "sha256:2298c859cfc5463f1b64bd55cb3e602528db6fa0f3cfd568d3605c50678f8f03"}, - {file = "MarkupSafe-2.1.2-cp38-cp38-win32.whl", hash = "sha256:50c42830a633fa0cf9e7d27664637532791bfc31c731a87b202d2d8ac40c3ea2"}, - {file = "MarkupSafe-2.1.2-cp38-cp38-win_amd64.whl", hash = "sha256:bb06feb762bade6bf3c8b844462274db0c76acc95c52abe8dbed28ae3d44a147"}, - {file = "MarkupSafe-2.1.2-cp39-cp39-macosx_10_9_universal2.whl", hash = "sha256:99625a92da8229df6d44335e6fcc558a5037dd0a760e11d84be2260e6f37002f"}, - {file = "MarkupSafe-2.1.2-cp39-cp39-macosx_10_9_x86_64.whl", hash = "sha256:8bca7e26c1dd751236cfb0c6c72d4ad61d986e9a41bbf76cb445f69488b2a2bd"}, - {file = "MarkupSafe-2.1.2-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:40627dcf047dadb22cd25ea7ecfe9cbf3bbbad0482ee5920b582f3809c97654f"}, - {file = "MarkupSafe-2.1.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:40dfd3fefbef579ee058f139733ac336312663c6706d1163b82b3003fb1925c4"}, - {file = "MarkupSafe-2.1.2-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:090376d812fb6ac5f171e5938e82e7f2d7adc2b629101cec0db8b267815c85e2"}, - {file = "MarkupSafe-2.1.2-cp39-cp39-musllinux_1_1_aarch64.whl", hash = "sha256:2e7821bffe00aa6bd07a23913b7f4e01328c3d5cc0b40b36c0bd81d362faeb65"}, - {file = "MarkupSafe-2.1.2-cp39-cp39-musllinux_1_1_i686.whl", hash = "sha256:c0a33bc9f02c2b17c3ea382f91b4db0e6cde90b63b296422a939886a7a80de1c"}, - {file = "MarkupSafe-2.1.2-cp39-cp39-musllinux_1_1_x86_64.whl", hash = "sha256:b8526c6d437855442cdd3d87eede9c425c4445ea011ca38d937db299382e6fa3"}, - {file = "MarkupSafe-2.1.2-cp39-cp39-win32.whl", hash = "sha256:137678c63c977754abe9086a3ec011e8fd985ab90631145dfb9294ad09c102a7"}, - {file = "MarkupSafe-2.1.2-cp39-cp39-win_amd64.whl", hash = "sha256:0576fe974b40a400449768941d5d0858cc624e3249dfd1e0c33674e5c7ca7aed"}, - {file = "MarkupSafe-2.1.2.tar.gz", hash = "sha256:abcabc8c2b26036d62d4c746381a6f7cf60aafcc653198ad678306986b09450d"}, + {file = "MarkupSafe-2.1.3-cp310-cp310-macosx_10_9_universal2.whl", hash = "sha256:cd0f502fe016460680cd20aaa5a76d241d6f35a1c3350c474bac1273803893fa"}, + {file = "MarkupSafe-2.1.3-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:e09031c87a1e51556fdcb46e5bd4f59dfb743061cf93c4d6831bf894f125eb57"}, + {file = "MarkupSafe-2.1.3-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:68e78619a61ecf91e76aa3e6e8e33fc4894a2bebe93410754bd28fce0a8a4f9f"}, + {file = "MarkupSafe-2.1.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:65c1a9bcdadc6c28eecee2c119465aebff8f7a584dd719facdd9e825ec61ab52"}, + {file = "MarkupSafe-2.1.3-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:525808b8019e36eb524b8c68acdd63a37e75714eac50e988180b169d64480a00"}, + {file = "MarkupSafe-2.1.3-cp310-cp310-musllinux_1_1_aarch64.whl", hash = "sha256:962f82a3086483f5e5f64dbad880d31038b698494799b097bc59c2edf392fce6"}, + {file = "MarkupSafe-2.1.3-cp310-cp310-musllinux_1_1_i686.whl", hash = "sha256:aa7bd130efab1c280bed0f45501b7c8795f9fdbeb02e965371bbef3523627779"}, + {file = "MarkupSafe-2.1.3-cp310-cp310-musllinux_1_1_x86_64.whl", hash = "sha256:c9c804664ebe8f83a211cace637506669e7890fec1b4195b505c214e50dd4eb7"}, + {file = "MarkupSafe-2.1.3-cp310-cp310-win32.whl", hash = "sha256:10bbfe99883db80bdbaff2dcf681dfc6533a614f700da1287707e8a5d78a8431"}, + {file = "MarkupSafe-2.1.3-cp310-cp310-win_amd64.whl", hash = "sha256:1577735524cdad32f9f694208aa75e422adba74f1baee7551620e43a3141f559"}, + {file = "MarkupSafe-2.1.3-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:ad9e82fb8f09ade1c3e1b996a6337afac2b8b9e365f926f5a61aacc71adc5b3c"}, + {file = "MarkupSafe-2.1.3-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:3c0fae6c3be832a0a0473ac912810b2877c8cb9d76ca48de1ed31e1c68386575"}, + {file = "MarkupSafe-2.1.3-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:b076b6226fb84157e3f7c971a47ff3a679d837cf338547532ab866c57930dbee"}, + {file = "MarkupSafe-2.1.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:bfce63a9e7834b12b87c64d6b155fdd9b3b96191b6bd334bf37db7ff1fe457f2"}, + {file = "MarkupSafe-2.1.3-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:338ae27d6b8745585f87218a3f23f1512dbf52c26c28e322dbe54bcede54ccb9"}, + {file = "MarkupSafe-2.1.3-cp311-cp311-musllinux_1_1_aarch64.whl", hash = "sha256:e4dd52d80b8c83fdce44e12478ad2e85c64ea965e75d66dbeafb0a3e77308fcc"}, + {file = "MarkupSafe-2.1.3-cp311-cp311-musllinux_1_1_i686.whl", hash = "sha256:df0be2b576a7abbf737b1575f048c23fb1d769f267ec4358296f31c2479db8f9"}, + {file = "MarkupSafe-2.1.3-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:5bbe06f8eeafd38e5d0a4894ffec89378b6c6a625ff57e3028921f8ff59318ac"}, + {file = "MarkupSafe-2.1.3-cp311-cp311-win32.whl", hash = "sha256:dd15ff04ffd7e05ffcb7fe79f1b98041b8ea30ae9234aed2a9168b5797c3effb"}, + {file = "MarkupSafe-2.1.3-cp311-cp311-win_amd64.whl", hash = "sha256:134da1eca9ec0ae528110ccc9e48041e0828d79f24121a1a146161103c76e686"}, + {file = "MarkupSafe-2.1.3-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:8e254ae696c88d98da6555f5ace2279cf7cd5b3f52be2b5cf97feafe883b58d2"}, + {file = "MarkupSafe-2.1.3-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:cb0932dc158471523c9637e807d9bfb93e06a95cbf010f1a38b98623b929ef2b"}, + {file = "MarkupSafe-2.1.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:9402b03f1a1b4dc4c19845e5c749e3ab82d5078d16a2a4c2cd2df62d57bb0707"}, + {file = "MarkupSafe-2.1.3-cp37-cp37m-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:ca379055a47383d02a5400cb0d110cef0a776fc644cda797db0c5696cfd7e18e"}, + {file = "MarkupSafe-2.1.3-cp37-cp37m-musllinux_1_1_aarch64.whl", hash = "sha256:b7ff0f54cb4ff66dd38bebd335a38e2c22c41a8ee45aa608efc890ac3e3931bc"}, + {file = "MarkupSafe-2.1.3-cp37-cp37m-musllinux_1_1_i686.whl", hash = "sha256:c011a4149cfbcf9f03994ec2edffcb8b1dc2d2aede7ca243746df97a5d41ce48"}, + {file = "MarkupSafe-2.1.3-cp37-cp37m-musllinux_1_1_x86_64.whl", hash = "sha256:56d9f2ecac662ca1611d183feb03a3fa4406469dafe241673d521dd5ae92a155"}, + {file = "MarkupSafe-2.1.3-cp37-cp37m-win32.whl", hash = "sha256:8758846a7e80910096950b67071243da3e5a20ed2546e6392603c096778d48e0"}, + {file = "MarkupSafe-2.1.3-cp37-cp37m-win_amd64.whl", hash = "sha256:787003c0ddb00500e49a10f2844fac87aa6ce977b90b0feaaf9de23c22508b24"}, + {file = "MarkupSafe-2.1.3-cp38-cp38-macosx_10_9_universal2.whl", hash = "sha256:2ef12179d3a291be237280175b542c07a36e7f60718296278d8593d21ca937d4"}, + {file = "MarkupSafe-2.1.3-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:2c1b19b3aaacc6e57b7e25710ff571c24d6c3613a45e905b1fde04d691b98ee0"}, + {file = "MarkupSafe-2.1.3-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:8afafd99945ead6e075b973fefa56379c5b5c53fd8937dad92c662da5d8fd5ee"}, + {file = "MarkupSafe-2.1.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:8c41976a29d078bb235fea9b2ecd3da465df42a562910f9022f1a03107bd02be"}, + {file = "MarkupSafe-2.1.3-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:d080e0a5eb2529460b30190fcfcc4199bd7f827663f858a226a81bc27beaa97e"}, + {file = "MarkupSafe-2.1.3-cp38-cp38-musllinux_1_1_aarch64.whl", hash = "sha256:69c0f17e9f5a7afdf2cc9fb2d1ce6aabdb3bafb7f38017c0b77862bcec2bbad8"}, + {file = "MarkupSafe-2.1.3-cp38-cp38-musllinux_1_1_i686.whl", hash = "sha256:504b320cd4b7eff6f968eddf81127112db685e81f7e36e75f9f84f0df46041c3"}, + {file = "MarkupSafe-2.1.3-cp38-cp38-musllinux_1_1_x86_64.whl", hash = "sha256:42de32b22b6b804f42c5d98be4f7e5e977ecdd9ee9b660fda1a3edf03b11792d"}, + {file = "MarkupSafe-2.1.3-cp38-cp38-win32.whl", hash = "sha256:ceb01949af7121f9fc39f7d27f91be8546f3fb112c608bc4029aef0bab86a2a5"}, + {file = "MarkupSafe-2.1.3-cp38-cp38-win_amd64.whl", hash = "sha256:1b40069d487e7edb2676d3fbdb2b0829ffa2cd63a2ec26c4938b2d34391b4ecc"}, + {file = "MarkupSafe-2.1.3-cp39-cp39-macosx_10_9_universal2.whl", hash = "sha256:8023faf4e01efadfa183e863fefde0046de576c6f14659e8782065bcece22198"}, + {file = "MarkupSafe-2.1.3-cp39-cp39-macosx_10_9_x86_64.whl", hash = "sha256:6b2b56950d93e41f33b4223ead100ea0fe11f8e6ee5f641eb753ce4b77a7042b"}, + {file = "MarkupSafe-2.1.3-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:9dcdfd0eaf283af041973bff14a2e143b8bd64e069f4c383416ecd79a81aab58"}, + {file = "MarkupSafe-2.1.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:05fb21170423db021895e1ea1e1f3ab3adb85d1c2333cbc2310f2a26bc77272e"}, + {file = "MarkupSafe-2.1.3-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:282c2cb35b5b673bbcadb33a585408104df04f14b2d9b01d4c345a3b92861c2c"}, + {file = "MarkupSafe-2.1.3-cp39-cp39-musllinux_1_1_aarch64.whl", hash = "sha256:ab4a0df41e7c16a1392727727e7998a467472d0ad65f3ad5e6e765015df08636"}, + {file = "MarkupSafe-2.1.3-cp39-cp39-musllinux_1_1_i686.whl", hash = "sha256:7ef3cb2ebbf91e330e3bb937efada0edd9003683db6b57bb108c4001f37a02ea"}, + {file = "MarkupSafe-2.1.3-cp39-cp39-musllinux_1_1_x86_64.whl", hash = "sha256:0a4e4a1aff6c7ac4cd55792abf96c915634c2b97e3cc1c7129578aa68ebd754e"}, + {file = "MarkupSafe-2.1.3-cp39-cp39-win32.whl", hash = "sha256:fec21693218efe39aa7f8599346e90c705afa52c5b31ae019b2e57e8f6542bb2"}, + {file = "MarkupSafe-2.1.3-cp39-cp39-win_amd64.whl", hash = "sha256:3fd4abcb888d15a94f32b75d8fd18ee162ca0c064f35b11134be77050296d6ba"}, + {file = "MarkupSafe-2.1.3.tar.gz", hash = "sha256:af598ed32d6ae86f1b747b82783958b1a4ab8f617b06fe68795c7f026abbdcad"}, ] [[package]] @@ -1166,38 +1166,38 @@ files = [ [[package]] name = "mypy" -version = "1.2.0" +version = "1.3.0" description = "Optional static typing for Python" category = "dev" optional = false python-versions = ">=3.7" files = [ - {file = "mypy-1.2.0-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:701189408b460a2ff42b984e6bd45c3f41f0ac9f5f58b8873bbedc511900086d"}, - {file = "mypy-1.2.0-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:fe91be1c51c90e2afe6827601ca14353bbf3953f343c2129fa1e247d55fd95ba"}, - {file = "mypy-1.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:8d26b513225ffd3eacece727f4387bdce6469192ef029ca9dd469940158bc89e"}, - {file = "mypy-1.2.0-cp310-cp310-musllinux_1_1_x86_64.whl", hash = "sha256:3a2d219775a120581a0ae8ca392b31f238d452729adbcb6892fa89688cb8306a"}, - {file = "mypy-1.2.0-cp310-cp310-win_amd64.whl", hash = "sha256:2e93a8a553e0394b26c4ca683923b85a69f7ccdc0139e6acd1354cc884fe0128"}, - {file = "mypy-1.2.0-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:3efde4af6f2d3ccf58ae825495dbb8d74abd6d176ee686ce2ab19bd025273f41"}, - {file = "mypy-1.2.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:695c45cea7e8abb6f088a34a6034b1d273122e5530aeebb9c09626cea6dca4cb"}, - {file = "mypy-1.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:d0e9464a0af6715852267bf29c9553e4555b61f5904a4fc538547a4d67617937"}, - {file = "mypy-1.2.0-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:8293a216e902ac12779eb7a08f2bc39ec6c878d7c6025aa59464e0c4c16f7eb9"}, - {file = "mypy-1.2.0-cp311-cp311-win_amd64.whl", hash = "sha256:f46af8d162f3d470d8ffc997aaf7a269996d205f9d746124a179d3abe05ac602"}, - {file = "mypy-1.2.0-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:031fc69c9a7e12bcc5660b74122ed84b3f1c505e762cc4296884096c6d8ee140"}, - {file = "mypy-1.2.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:390bc685ec209ada4e9d35068ac6988c60160b2b703072d2850457b62499e336"}, - {file = "mypy-1.2.0-cp37-cp37m-musllinux_1_1_x86_64.whl", hash = "sha256:4b41412df69ec06ab141808d12e0bf2823717b1c363bd77b4c0820feaa37249e"}, - {file = "mypy-1.2.0-cp37-cp37m-win_amd64.whl", hash = "sha256:4e4a682b3f2489d218751981639cffc4e281d548f9d517addfd5a2917ac78119"}, - {file = "mypy-1.2.0-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:a197ad3a774f8e74f21e428f0de7f60ad26a8d23437b69638aac2764d1e06a6a"}, - {file = "mypy-1.2.0-cp38-cp38-macosx_11_0_arm64.whl", hash = "sha256:c9a084bce1061e55cdc0493a2ad890375af359c766b8ac311ac8120d3a472950"}, - {file = "mypy-1.2.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:eaeaa0888b7f3ccb7bcd40b50497ca30923dba14f385bde4af78fac713d6d6f6"}, - {file = "mypy-1.2.0-cp38-cp38-musllinux_1_1_x86_64.whl", hash = "sha256:bea55fc25b96c53affab852ad94bf111a3083bc1d8b0c76a61dd101d8a388cf5"}, - {file = "mypy-1.2.0-cp38-cp38-win_amd64.whl", hash = "sha256:4c8d8c6b80aa4a1689f2a179d31d86ae1367ea4a12855cc13aa3ba24bb36b2d8"}, - {file = "mypy-1.2.0-cp39-cp39-macosx_10_9_x86_64.whl", hash = "sha256:70894c5345bea98321a2fe84df35f43ee7bb0feec117a71420c60459fc3e1eed"}, - {file = "mypy-1.2.0-cp39-cp39-macosx_11_0_arm64.whl", hash = "sha256:4a99fe1768925e4a139aace8f3fb66db3576ee1c30b9c0f70f744ead7e329c9f"}, - {file = "mypy-1.2.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:023fe9e618182ca6317ae89833ba422c411469156b690fde6a315ad10695a521"}, - {file = "mypy-1.2.0-cp39-cp39-musllinux_1_1_x86_64.whl", hash = "sha256:4d19f1a239d59f10fdc31263d48b7937c585810288376671eaf75380b074f238"}, - {file = "mypy-1.2.0-cp39-cp39-win_amd64.whl", hash = "sha256:2de7babe398cb7a85ac7f1fd5c42f396c215ab3eff731b4d761d68d0f6a80f48"}, - {file = "mypy-1.2.0-py3-none-any.whl", hash = "sha256:d8e9187bfcd5ffedbe87403195e1fc340189a68463903c39e2b63307c9fa0394"}, - {file = "mypy-1.2.0.tar.gz", hash = "sha256:f70a40410d774ae23fcb4afbbeca652905a04de7948eaf0b1789c8d1426b72d1"}, + {file = "mypy-1.3.0-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:c1eb485cea53f4f5284e5baf92902cd0088b24984f4209e25981cc359d64448d"}, + {file = "mypy-1.3.0-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:4c99c3ecf223cf2952638da9cd82793d8f3c0c5fa8b6ae2b2d9ed1e1ff51ba85"}, + {file = "mypy-1.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:550a8b3a19bb6589679a7c3c31f64312e7ff482a816c96e0cecec9ad3a7564dd"}, + {file = "mypy-1.3.0-cp310-cp310-musllinux_1_1_x86_64.whl", hash = "sha256:cbc07246253b9e3d7d74c9ff948cd0fd7a71afcc2b77c7f0a59c26e9395cb152"}, + {file = "mypy-1.3.0-cp310-cp310-win_amd64.whl", hash = "sha256:a22435632710a4fcf8acf86cbd0d69f68ac389a3892cb23fbad176d1cddaf228"}, + {file = "mypy-1.3.0-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:6e33bb8b2613614a33dff70565f4c803f889ebd2f859466e42b46e1df76018dd"}, + {file = "mypy-1.3.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:7d23370d2a6b7a71dc65d1266f9a34e4cde9e8e21511322415db4b26f46f6b8c"}, + {file = "mypy-1.3.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:658fe7b674769a0770d4b26cb4d6f005e88a442fe82446f020be8e5f5efb2fae"}, + {file = "mypy-1.3.0-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:6e42d29e324cdda61daaec2336c42512e59c7c375340bd202efa1fe0f7b8f8ca"}, + {file = "mypy-1.3.0-cp311-cp311-win_amd64.whl", hash = "sha256:d0b6c62206e04061e27009481cb0ec966f7d6172b5b936f3ead3d74f29fe3dcf"}, + {file = "mypy-1.3.0-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:76ec771e2342f1b558c36d49900dfe81d140361dd0d2df6cd71b3db1be155409"}, + {file = "mypy-1.3.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:ebc95f8386314272bbc817026f8ce8f4f0d2ef7ae44f947c4664efac9adec929"}, + {file = "mypy-1.3.0-cp37-cp37m-musllinux_1_1_x86_64.whl", hash = "sha256:faff86aa10c1aa4a10e1a301de160f3d8fc8703b88c7e98de46b531ff1276a9a"}, + {file = "mypy-1.3.0-cp37-cp37m-win_amd64.whl", hash = "sha256:8c5979d0deb27e0f4479bee18ea0f83732a893e81b78e62e2dda3e7e518c92ee"}, + {file = "mypy-1.3.0-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:c5d2cc54175bab47011b09688b418db71403aefad07cbcd62d44010543fc143f"}, + {file = "mypy-1.3.0-cp38-cp38-macosx_11_0_arm64.whl", hash = "sha256:87df44954c31d86df96c8bd6e80dfcd773473e877ac6176a8e29898bfb3501cb"}, + {file = "mypy-1.3.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:473117e310febe632ddf10e745a355714e771ffe534f06db40702775056614c4"}, + {file = "mypy-1.3.0-cp38-cp38-musllinux_1_1_x86_64.whl", hash = "sha256:74bc9b6e0e79808bf8678d7678b2ae3736ea72d56eede3820bd3849823e7f305"}, + {file = "mypy-1.3.0-cp38-cp38-win_amd64.whl", hash = "sha256:44797d031a41516fcf5cbfa652265bb994e53e51994c1bd649ffcd0c3a7eccbf"}, + {file = "mypy-1.3.0-cp39-cp39-macosx_10_9_x86_64.whl", hash = "sha256:ddae0f39ca146972ff6bb4399f3b2943884a774b8771ea0a8f50e971f5ea5ba8"}, + {file = "mypy-1.3.0-cp39-cp39-macosx_11_0_arm64.whl", hash = "sha256:1c4c42c60a8103ead4c1c060ac3cdd3ff01e18fddce6f1016e08939647a0e703"}, + {file = "mypy-1.3.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:e86c2c6852f62f8f2b24cb7a613ebe8e0c7dc1402c61d36a609174f63e0ff017"}, + {file = "mypy-1.3.0-cp39-cp39-musllinux_1_1_x86_64.whl", hash = "sha256:f9dca1e257d4cc129517779226753dbefb4f2266c4eaad610fc15c6a7e14283e"}, + {file = "mypy-1.3.0-cp39-cp39-win_amd64.whl", hash = "sha256:95d8d31a7713510685b05fbb18d6ac287a56c8f6554d88c19e73f724a445448a"}, + {file = "mypy-1.3.0-py3-none-any.whl", hash = "sha256:a8763e72d5d9574d45ce5881962bc8e9046bf7b375b0abf031f3e6811732a897"}, + {file = "mypy-1.3.0.tar.gz", hash = "sha256:e1f4d16e296f5135624b34e8fb741eb0eadedca90862405b1f1fde2040b9bd11"}, ] [package.dependencies] @@ -1225,14 +1225,14 @@ files = [ [[package]] name = "nodeenv" -version = "1.7.0" +version = "1.8.0" description = "Node.js virtual environment builder" category = "dev" optional = false python-versions = ">=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*,!=3.5.*,!=3.6.*" files = [ - {file = "nodeenv-1.7.0-py2.py3-none-any.whl", hash = "sha256:27083a7b96a25f2f5e1d8cb4b6317ee8aeda3bdd121394e5ac54e498028a042e"}, - {file = "nodeenv-1.7.0.tar.gz", hash = "sha256:e0e7f7dfb85fc5394c6fe1e8fa98131a2473e04311a45afb6508f7cf1836fa2b"}, + {file = "nodeenv-1.8.0-py2.py3-none-any.whl", hash = "sha256:df865724bb3c3adc86b3876fa209771517b0cfe596beff01a92700e0e8be4cec"}, + {file = "nodeenv-1.8.0.tar.gz", hash = "sha256:d51e0c37e64fbf47d017feac3145cdbb58836d7eee8c6f6d3b6880c5456227d2"}, ] [package.dependencies] @@ -1354,48 +1354,48 @@ test = ["pytest"] [[package]] name = "pydantic" -version = "1.10.7" +version = "1.10.8" description = "Data validation and settings management using python type hints" category = "main" optional = false python-versions = ">=3.7" files = [ - {file = "pydantic-1.10.7-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:e79e999e539872e903767c417c897e729e015872040e56b96e67968c3b918b2d"}, - {file = "pydantic-1.10.7-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:01aea3a42c13f2602b7ecbbea484a98169fb568ebd9e247593ea05f01b884b2e"}, - {file = "pydantic-1.10.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:516f1ed9bc2406a0467dd777afc636c7091d71f214d5e413d64fef45174cfc7a"}, - {file = "pydantic-1.10.7-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:ae150a63564929c675d7f2303008d88426a0add46efd76c3fc797cd71cb1b46f"}, - {file = "pydantic-1.10.7-cp310-cp310-musllinux_1_1_i686.whl", hash = "sha256:ecbbc51391248116c0a055899e6c3e7ffbb11fb5e2a4cd6f2d0b93272118a209"}, - {file = "pydantic-1.10.7-cp310-cp310-musllinux_1_1_x86_64.whl", hash = "sha256:f4a2b50e2b03d5776e7f21af73e2070e1b5c0d0df255a827e7c632962f8315af"}, - {file = "pydantic-1.10.7-cp310-cp310-win_amd64.whl", hash = "sha256:a7cd2251439988b413cb0a985c4ed82b6c6aac382dbaff53ae03c4b23a70e80a"}, - {file = "pydantic-1.10.7-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:68792151e174a4aa9e9fc1b4e653e65a354a2fa0fed169f7b3d09902ad2cb6f1"}, - {file = "pydantic-1.10.7-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:dfe2507b8ef209da71b6fb5f4e597b50c5a34b78d7e857c4f8f3115effaef5fe"}, - {file = "pydantic-1.10.7-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:10a86d8c8db68086f1e30a530f7d5f83eb0685e632e411dbbcf2d5c0150e8dcd"}, - {file = "pydantic-1.10.7-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:d75ae19d2a3dbb146b6f324031c24f8a3f52ff5d6a9f22f0683694b3afcb16fb"}, - {file = "pydantic-1.10.7-cp311-cp311-musllinux_1_1_i686.whl", hash = "sha256:464855a7ff7f2cc2cf537ecc421291b9132aa9c79aef44e917ad711b4a93163b"}, - {file = "pydantic-1.10.7-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:193924c563fae6ddcb71d3f06fa153866423ac1b793a47936656e806b64e24ca"}, - {file = "pydantic-1.10.7-cp311-cp311-win_amd64.whl", hash = "sha256:b4a849d10f211389502059c33332e91327bc154acc1845f375a99eca3afa802d"}, - {file = "pydantic-1.10.7-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:cc1dde4e50a5fc1336ee0581c1612215bc64ed6d28d2c7c6f25d2fe3e7c3e918"}, - {file = "pydantic-1.10.7-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:e0cfe895a504c060e5d36b287ee696e2fdad02d89e0d895f83037245218a87fe"}, - {file = "pydantic-1.10.7-cp37-cp37m-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:670bb4683ad1e48b0ecb06f0cfe2178dcf74ff27921cdf1606e527d2617a81ee"}, - {file = "pydantic-1.10.7-cp37-cp37m-musllinux_1_1_i686.whl", hash = "sha256:950ce33857841f9a337ce07ddf46bc84e1c4946d2a3bba18f8280297157a3fd1"}, - {file = "pydantic-1.10.7-cp37-cp37m-musllinux_1_1_x86_64.whl", hash = "sha256:c15582f9055fbc1bfe50266a19771bbbef33dd28c45e78afbe1996fd70966c2a"}, - {file = "pydantic-1.10.7-cp37-cp37m-win_amd64.whl", hash = "sha256:82dffb306dd20bd5268fd6379bc4bfe75242a9c2b79fec58e1041fbbdb1f7914"}, - {file = "pydantic-1.10.7-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:8c7f51861d73e8b9ddcb9916ae7ac39fb52761d9ea0df41128e81e2ba42886cd"}, - {file = "pydantic-1.10.7-cp38-cp38-macosx_11_0_arm64.whl", hash = "sha256:6434b49c0b03a51021ade5c4daa7d70c98f7a79e95b551201fff682fc1661245"}, - {file = "pydantic-1.10.7-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:64d34ab766fa056df49013bb6e79921a0265204c071984e75a09cbceacbbdd5d"}, - {file = "pydantic-1.10.7-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:701daea9ffe9d26f97b52f1d157e0d4121644f0fcf80b443248434958fd03dc3"}, - {file = "pydantic-1.10.7-cp38-cp38-musllinux_1_1_i686.whl", hash = "sha256:cf135c46099ff3f919d2150a948ce94b9ce545598ef2c6c7bf55dca98a304b52"}, - {file = "pydantic-1.10.7-cp38-cp38-musllinux_1_1_x86_64.whl", hash = "sha256:b0f85904f73161817b80781cc150f8b906d521fa11e3cdabae19a581c3606209"}, - {file = "pydantic-1.10.7-cp38-cp38-win_amd64.whl", hash = "sha256:9f6f0fd68d73257ad6685419478c5aece46432f4bdd8d32c7345f1986496171e"}, - {file = "pydantic-1.10.7-cp39-cp39-macosx_10_9_x86_64.whl", hash = "sha256:c230c0d8a322276d6e7b88c3f7ce885f9ed16e0910354510e0bae84d54991143"}, - {file = "pydantic-1.10.7-cp39-cp39-macosx_11_0_arm64.whl", hash = "sha256:976cae77ba6a49d80f461fd8bba183ff7ba79f44aa5cfa82f1346b5626542f8e"}, - {file = "pydantic-1.10.7-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:7d45fc99d64af9aaf7e308054a0067fdcd87ffe974f2442312372dfa66e1001d"}, - {file = "pydantic-1.10.7-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:d2a5ebb48958754d386195fe9e9c5106f11275867051bf017a8059410e9abf1f"}, - {file = "pydantic-1.10.7-cp39-cp39-musllinux_1_1_i686.whl", hash = "sha256:abfb7d4a7cd5cc4e1d1887c43503a7c5dd608eadf8bc615413fc498d3e4645cd"}, - {file = "pydantic-1.10.7-cp39-cp39-musllinux_1_1_x86_64.whl", hash = "sha256:80b1fab4deb08a8292d15e43a6edccdffa5377a36a4597bb545b93e79c5ff0a5"}, - {file = "pydantic-1.10.7-cp39-cp39-win_amd64.whl", hash = "sha256:d71e69699498b020ea198468e2480a2f1e7433e32a3a99760058c6520e2bea7e"}, - {file = "pydantic-1.10.7-py3-none-any.whl", hash = "sha256:0cd181f1d0b1d00e2b705f1bf1ac7799a2d938cce3376b8007df62b29be3c2c6"}, - {file = "pydantic-1.10.7.tar.gz", hash = "sha256:cfc83c0678b6ba51b0532bea66860617c4cd4251ecf76e9846fa5a9f3454e97e"}, + {file = "pydantic-1.10.8-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:1243d28e9b05003a89d72e7915fdb26ffd1d39bdd39b00b7dbe4afae4b557f9d"}, + {file = "pydantic-1.10.8-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:c0ab53b609c11dfc0c060d94335993cc2b95b2150e25583bec37a49b2d6c6c3f"}, + {file = "pydantic-1.10.8-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:f9613fadad06b4f3bc5db2653ce2f22e0de84a7c6c293909b48f6ed37b83c61f"}, + {file = "pydantic-1.10.8-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:df7800cb1984d8f6e249351139667a8c50a379009271ee6236138a22a0c0f319"}, + {file = "pydantic-1.10.8-cp310-cp310-musllinux_1_1_i686.whl", hash = "sha256:0c6fafa0965b539d7aab0a673a046466d23b86e4b0e8019d25fd53f4df62c277"}, + {file = "pydantic-1.10.8-cp310-cp310-musllinux_1_1_x86_64.whl", hash = "sha256:e82d4566fcd527eae8b244fa952d99f2ca3172b7e97add0b43e2d97ee77f81ab"}, + {file = "pydantic-1.10.8-cp310-cp310-win_amd64.whl", hash = "sha256:ab523c31e22943713d80d8d342d23b6f6ac4b792a1e54064a8d0cf78fd64e800"}, + {file = "pydantic-1.10.8-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:666bdf6066bf6dbc107b30d034615d2627e2121506c555f73f90b54a463d1f33"}, + {file = "pydantic-1.10.8-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:35db5301b82e8661fa9c505c800d0990bc14e9f36f98932bb1d248c0ac5cada5"}, + {file = "pydantic-1.10.8-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:f90c1e29f447557e9e26afb1c4dbf8768a10cc676e3781b6a577841ade126b85"}, + {file = "pydantic-1.10.8-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:93e766b4a8226e0708ef243e843105bf124e21331694367f95f4e3b4a92bbb3f"}, + {file = "pydantic-1.10.8-cp311-cp311-musllinux_1_1_i686.whl", hash = "sha256:88f195f582851e8db960b4a94c3e3ad25692c1c1539e2552f3df7a9e972ef60e"}, + {file = "pydantic-1.10.8-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:34d327c81e68a1ecb52fe9c8d50c8a9b3e90d3c8ad991bfc8f953fb477d42fb4"}, + {file = "pydantic-1.10.8-cp311-cp311-win_amd64.whl", hash = "sha256:d532bf00f381bd6bc62cabc7d1372096b75a33bc197a312b03f5838b4fb84edd"}, + {file = "pydantic-1.10.8-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:7d5b8641c24886d764a74ec541d2fc2c7fb19f6da2a4001e6d580ba4a38f7878"}, + {file = "pydantic-1.10.8-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:7b1f6cb446470b7ddf86c2e57cd119a24959af2b01e552f60705910663af09a4"}, + {file = "pydantic-1.10.8-cp37-cp37m-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:c33b60054b2136aef8cf190cd4c52a3daa20b2263917c49adad20eaf381e823b"}, + {file = "pydantic-1.10.8-cp37-cp37m-musllinux_1_1_i686.whl", hash = "sha256:1952526ba40b220b912cdc43c1c32bcf4a58e3f192fa313ee665916b26befb68"}, + {file = "pydantic-1.10.8-cp37-cp37m-musllinux_1_1_x86_64.whl", hash = "sha256:bb14388ec45a7a0dc429e87def6396f9e73c8c77818c927b6a60706603d5f2ea"}, + {file = "pydantic-1.10.8-cp37-cp37m-win_amd64.whl", hash = "sha256:16f8c3e33af1e9bb16c7a91fc7d5fa9fe27298e9f299cff6cb744d89d573d62c"}, + {file = "pydantic-1.10.8-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:1ced8375969673929809d7f36ad322934c35de4af3b5e5b09ec967c21f9f7887"}, + {file = "pydantic-1.10.8-cp38-cp38-macosx_11_0_arm64.whl", hash = "sha256:93e6bcfccbd831894a6a434b0aeb1947f9e70b7468f274154d03d71fabb1d7c6"}, + {file = "pydantic-1.10.8-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:191ba419b605f897ede9892f6c56fb182f40a15d309ef0142212200a10af4c18"}, + {file = "pydantic-1.10.8-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:052d8654cb65174d6f9490cc9b9a200083a82cf5c3c5d3985db765757eb3b375"}, + {file = "pydantic-1.10.8-cp38-cp38-musllinux_1_1_i686.whl", hash = "sha256:ceb6a23bf1ba4b837d0cfe378329ad3f351b5897c8d4914ce95b85fba96da5a1"}, + {file = "pydantic-1.10.8-cp38-cp38-musllinux_1_1_x86_64.whl", hash = "sha256:6f2e754d5566f050954727c77f094e01793bcb5725b663bf628fa6743a5a9108"}, + {file = "pydantic-1.10.8-cp38-cp38-win_amd64.whl", hash = "sha256:6a82d6cda82258efca32b40040228ecf43a548671cb174a1e81477195ed3ed56"}, + {file = "pydantic-1.10.8-cp39-cp39-macosx_10_9_x86_64.whl", hash = "sha256:3e59417ba8a17265e632af99cc5f35ec309de5980c440c255ab1ca3ae96a3e0e"}, + {file = "pydantic-1.10.8-cp39-cp39-macosx_11_0_arm64.whl", hash = "sha256:84d80219c3f8d4cad44575e18404099c76851bc924ce5ab1c4c8bb5e2a2227d0"}, + {file = "pydantic-1.10.8-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:2e4148e635994d57d834be1182a44bdb07dd867fa3c2d1b37002000646cc5459"}, + {file = "pydantic-1.10.8-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:12f7b0bf8553e310e530e9f3a2f5734c68699f42218bf3568ef49cd9b0e44df4"}, + {file = "pydantic-1.10.8-cp39-cp39-musllinux_1_1_i686.whl", hash = "sha256:42aa0c4b5c3025483240a25b09f3c09a189481ddda2ea3a831a9d25f444e03c1"}, + {file = "pydantic-1.10.8-cp39-cp39-musllinux_1_1_x86_64.whl", hash = "sha256:17aef11cc1b997f9d574b91909fed40761e13fac438d72b81f902226a69dac01"}, + {file = "pydantic-1.10.8-cp39-cp39-win_amd64.whl", hash = "sha256:66a703d1983c675a6e0fed8953b0971c44dba48a929a2000a493c3772eb61a5a"}, + {file = "pydantic-1.10.8-py3-none-any.whl", hash = "sha256:7456eb22ed9aaa24ff3e7b4757da20d9e5ce2a81018c1b3ebd81a0b88a18f3b2"}, + {file = "pydantic-1.10.8.tar.gz", hash = "sha256:1410275520dfa70effadf4c21811d755e7ef9bb1f1d077a21958153a92c8d9ca"}, ] [package.dependencies] @@ -1440,42 +1440,42 @@ tests = ["coverage[toml] (==5.0.4)", "pytest (>=6.0.0,<7.0.0)"] [[package]] name = "pymupdf" -version = "1.22.2" +version = "1.22.3" description = "Python bindings for the PDF toolkit and renderer MuPDF" category = "main" optional = false python-versions = ">=3.7" files = [ - {file = "PyMuPDF-1.22.2-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:defab2e12035e7f0044b60fca7b3841b87e6ca9a99a9d2c8efdbed8afdbc0f38"}, - {file = "PyMuPDF-1.22.2-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:495f2d3fdfcf403bd64e65f93554e15fa61b858f13bfcabfca8e6518fde46f45"}, - {file = "PyMuPDF-1.22.2-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:09eca2490e5779ca310caaf46153e3e66099c67ef13ae8a08ae071a4e44707de"}, - {file = "PyMuPDF-1.22.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:abe501d538123d7b1c92e0b9cb06de53e11721cadf412ed7046f80b16f8bd61c"}, - {file = "PyMuPDF-1.22.2-cp310-cp310-win32.whl", hash = "sha256:b4dd727c6e8957112b2687a4db7f0ac39a1e056d0b1fb66925a78a4ec1cd9244"}, - {file = "PyMuPDF-1.22.2-cp310-cp310-win_amd64.whl", hash = "sha256:69341a5a8fa66b42eb93aed757d8a4c1be068de8a29524ef0fd3e328692aa398"}, - {file = "PyMuPDF-1.22.2-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:533fa0799bfdca66256b7487def30943c92fa79dcef58d7df943f12d2a818404"}, - {file = "PyMuPDF-1.22.2-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:a8132b5c4527e3419f2864f92429dc729cc0c01a3712b950c320cad4f0063127"}, - {file = "PyMuPDF-1.22.2-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:a93e14e27e8cc7f1bef1d50502c0b5a17e95f0efc712112fea96a9c72afc2d6e"}, - {file = "PyMuPDF-1.22.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:fb6a979513e332352ef8eb8b089520b8343bb0ffa97a2bb87c782cbe9343375f"}, - {file = "PyMuPDF-1.22.2-cp311-cp311-win32.whl", hash = "sha256:c5bfeb181926bee7a68058d3f2c7f821c2a7cce19a7ef2359fcd4eb299096a3b"}, - {file = "PyMuPDF-1.22.2-cp311-cp311-win_amd64.whl", hash = "sha256:d525f43b17a3af82d00f602988f0d3ead1b5e6b58df22854afb905804e7b97dc"}, - {file = "PyMuPDF-1.22.2-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:490cd9b13b93aae57181cdefdbadd716cfe08f315b115169154b6dfc7832aad1"}, - {file = "PyMuPDF-1.22.2-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:465d65434c03c62afa9e6fa52aca10f8ddc09de97eea8dff524f59c95f4b7205"}, - {file = "PyMuPDF-1.22.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:d145a0f61578a9f67d2c412e4f8d488e59300e31f572422d5e508fc06d4f848d"}, - {file = "PyMuPDF-1.22.2-cp37-cp37m-win32.whl", hash = "sha256:e8fb53592aefcf0bd7f580c357e6c1bbc9a9598722e9cb82890c2b774c761c2c"}, - {file = "PyMuPDF-1.22.2-cp37-cp37m-win_amd64.whl", hash = "sha256:4fc503d8d9b005e06796dd0f251216d2bf5192cc903d8b4a67a04fc99d9a0c78"}, - {file = "PyMuPDF-1.22.2-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:6caf1ef4e4543022322205ad5461ff14f72cd7c36e5b575a787b4da83557904c"}, - {file = "PyMuPDF-1.22.2-cp38-cp38-macosx_11_0_arm64.whl", hash = "sha256:baf126edd3dbf2921fc251ca779e35234f2b453a3c623c3da3577d997de074d2"}, - {file = "PyMuPDF-1.22.2-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:7262097be7d8db9c47f15547bbb16beb6e5d5f231dc2f34642d13e1a77951b36"}, - {file = "PyMuPDF-1.22.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:93c5ab9871661204398171469c2f6dce9ddce9e06ab834ff5604488367bec1ea"}, - {file = "PyMuPDF-1.22.2-cp38-cp38-win32.whl", hash = "sha256:c809906557bd98627eae7504869948ff875f5a1e3ac1c7ff40e90547755db4e1"}, - {file = "PyMuPDF-1.22.2-cp38-cp38-win_amd64.whl", hash = "sha256:7fdb24032a78bb7051019580d8e9bbf30aa74e3410ae6fd8a46211a927bc44c4"}, - {file = "PyMuPDF-1.22.2-cp39-cp39-macosx_10_9_x86_64.whl", hash = "sha256:249051552fa2db72014515872e3984277fa72572c940f14b6489c463628abaf9"}, - {file = "PyMuPDF-1.22.2-cp39-cp39-macosx_11_0_arm64.whl", hash = "sha256:7993e00e6bf80e36b5d2bb1e785c2b6aaebcdf515a2609b5602a5d7c673fc901"}, - {file = "PyMuPDF-1.22.2-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:5e4e5ff26d09817dc4328bb8c17122d0fdb19edf9776c301513055d10a432b18"}, - {file = "PyMuPDF-1.22.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:98a15382edb5c57ea3cdeabfb8417f7d57fb9aba182b62780d78b82b2b96e800"}, - {file = "PyMuPDF-1.22.2-cp39-cp39-win32.whl", hash = "sha256:c8e60487d38272ae49714e709ce28c7126a2318620d99c89f26bd5da89d4e38f"}, - {file = "PyMuPDF-1.22.2-cp39-cp39-win_amd64.whl", hash = "sha256:489ecfaa2e35acaf3ddfe4b86d8a055e3cb7ec749df3cb486fe926c988ee76e0"}, - {file = "PyMuPDF-1.22.2.tar.gz", hash = "sha256:179fb3cb69de9727f73b5ff1745c91819da73ba2304791d198cde11495beb712"}, + {file = "PyMuPDF-1.22.3-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:0aff7ba35eb2cc285efea87500dd5ee0aaf94f4bb23a79187f0a74101aba7964"}, + {file = "PyMuPDF-1.22.3-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:13e90a5301990dafc5bba6bfa32aafca1f35809497c274c9d4af4f4bac2d8870"}, + {file = "PyMuPDF-1.22.3-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:201c7aecf9530c3a5aa33cd3d6b68e36492ff9ac48cb270d8f18e66654744419"}, + {file = "PyMuPDF-1.22.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:dbffc6cabb0cb20033870bde954bbed1436cf9fce33a14682e283bc893767250"}, + {file = "PyMuPDF-1.22.3-cp310-cp310-win32.whl", hash = "sha256:e344632215882b49fd2e28ffb848f55b1b34db6b5389917e4865b4d779cbdb4a"}, + {file = "PyMuPDF-1.22.3-cp310-cp310-win_amd64.whl", hash = "sha256:9d9bccfb29cbe3962a858c200376d54e7ba64d6f64c0b972ed5b68ff20157b06"}, + {file = "PyMuPDF-1.22.3-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:01daa4e3c2c1b93d357ba0d747d713ad40e0123b9bdca2395bf166f62dd8f703"}, + {file = "PyMuPDF-1.22.3-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:46c7fab408ae4d55c4181f95a76bc4f365f5ead3291f67274d6fe90f1b90c479"}, + {file = "PyMuPDF-1.22.3-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:a58af441ce454f33f75a4c93a5f76e4659f2c7c849036180f24ab4b84d9e512f"}, + {file = "PyMuPDF-1.22.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:6eddb0975ddd0bcf39812616b5675c26d740f83b12a39c3b5c4425f02c3da754"}, + {file = "PyMuPDF-1.22.3-cp311-cp311-win32.whl", hash = "sha256:ed4a624ffc9bebe5c67fc80e16798300d404089585bcdac14448034bd38c5072"}, + {file = "PyMuPDF-1.22.3-cp311-cp311-win_amd64.whl", hash = "sha256:4d2422dffdb4f1c2c8128e6d151f4de5e722388df276ac165572ad5290ad228a"}, + {file = "PyMuPDF-1.22.3-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:48ece127e202470209dc63ad8fa85f3e19ce302f5af02d38c7fc0b5798b9bfa6"}, + {file = "PyMuPDF-1.22.3-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:1f00097e8d2bc46dacdb776aeb810b1c760949f6353abdf6d12e8aefdc95dd35"}, + {file = "PyMuPDF-1.22.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:5932564a713bd7d576418070c3dd926cb5800edb4411f48813f7694af7386d3e"}, + {file = "PyMuPDF-1.22.3-cp37-cp37m-win32.whl", hash = "sha256:d4f38ecb9518ba2dc12f5f35f33c64ec5466faf20b833f4ac21a2a4190ffef93"}, + {file = "PyMuPDF-1.22.3-cp37-cp37m-win_amd64.whl", hash = "sha256:90950b328603a83b26c2eb2af0cf5498582fbbab84e86074bbb0ae44d745e2a3"}, + {file = "PyMuPDF-1.22.3-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:0a2040351a1279fafa1db82e5af50a785eb01dc4e1adb3c98e0abfd6e0a4995f"}, + {file = "PyMuPDF-1.22.3-cp38-cp38-macosx_11_0_arm64.whl", hash = "sha256:a67f2b12120ce9fe5c3f7cb192643134af2c4e28773a2cd5d56cbe1cae66d1b9"}, + {file = "PyMuPDF-1.22.3-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:4e0904c9bffdfbb527f4fe293986d74477780f0c98f59fa5b42a95e3e441e1f4"}, + {file = "PyMuPDF-1.22.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:9aaf3352d9c443ad7622e70b0ff9124079b09c16a1a1aa3f3dde9ba0e19f32a2"}, + {file = "PyMuPDF-1.22.3-cp38-cp38-win32.whl", hash = "sha256:4c037d5752efd562ac72e74295dfcc8d8dd406c0f6849054b29d2cbc32237ae0"}, + {file = "PyMuPDF-1.22.3-cp38-cp38-win_amd64.whl", hash = "sha256:be0803be2709285f17c932ee11d4b7f6d11d3e74e1888094e6310c55e9543673"}, + {file = "PyMuPDF-1.22.3-cp39-cp39-macosx_10_9_x86_64.whl", hash = "sha256:fa934c1a02f1f3bb04e447b95ef5b19d03cb2575fee76d23cb7a6d0c526444e2"}, + {file = "PyMuPDF-1.22.3-cp39-cp39-macosx_11_0_arm64.whl", hash = "sha256:932747941ed4973410244376ba77693253e4387e8e09cf2458bc9133348fc16e"}, + {file = "PyMuPDF-1.22.3-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:d4ea7b016c4561004b48143b8879e1d888e5ba3a1440e6558ea9a47f0d2e6f65"}, + {file = "PyMuPDF-1.22.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:bf275e5dbf332554f98b469899e5a0928b91cb574a5319aeecf1b7e8075cf4b7"}, + {file = "PyMuPDF-1.22.3-cp39-cp39-win32.whl", hash = "sha256:07d171255964f5a382e280a95a3148c08fc4ec20bf7907e040cf423cf29afe30"}, + {file = "PyMuPDF-1.22.3-cp39-cp39-win_amd64.whl", hash = "sha256:60db199553fc9c88cb9f2afba35f9cd54c042e7a6ea2b151ddcc542e6e75ac61"}, + {file = "PyMuPDF-1.22.3.tar.gz", hash = "sha256:5ecd928e96e63092571020973aa145b57b75707f3a3df97c742e563112615891"}, ] [[package]] @@ -1698,14 +1698,14 @@ jupyter = ["ipywidgets (>=7.5.1,<8.0.0)"] [[package]] name = "ruamel-yaml" -version = "0.17.26" +version = "0.17.31" description = "ruamel.yaml is a YAML parser/emitter that supports roundtrip preservation of comments, seq/map flow style, and map key order" category = "dev" optional = false python-versions = ">=3" files = [ - {file = "ruamel.yaml-0.17.26-py3-none-any.whl", hash = "sha256:25d0ee82a0a9a6f44683dcf8c282340def4074a4562f3a24f55695bb254c1693"}, - {file = "ruamel.yaml-0.17.26.tar.gz", hash = "sha256:baa2d0a5aad2034826c439ce61c142c07082b76f4791d54145e131206e998059"}, + {file = "ruamel.yaml-0.17.31-py3-none-any.whl", hash = "sha256:3cf153f0047ced526e723097ac615d3009371779432e304dbd5596b6f3a4c777"}, + {file = "ruamel.yaml-0.17.31.tar.gz", hash = "sha256:098ed1eb6d338a684891a72380277c1e6fc4d4ae0e120de9a447275056dda335"}, ] [package.dependencies] @@ -1814,19 +1814,19 @@ gitlab = ["python-gitlab (>=1.3.0)"] [[package]] name = "setuptools" -version = "67.7.2" +version = "67.8.0" description = "Easily download, build, install, upgrade, and uninstall Python packages" category = "main" optional = false python-versions = ">=3.7" files = [ - {file = "setuptools-67.7.2-py3-none-any.whl", hash = "sha256:23aaf86b85ca52ceb801d32703f12d77517b2556af839621c641fca11287952b"}, - {file = "setuptools-67.7.2.tar.gz", hash = "sha256:f104fa03692a2602fa0fec6c6a9e63b6c8a968de13e17c026957dd1f53d80990"}, + {file = "setuptools-67.8.0-py3-none-any.whl", hash = "sha256:5df61bf30bb10c6f756eb19e7c9f3b473051f48db77fddbe06ff2ca307df9a6f"}, + {file = "setuptools-67.8.0.tar.gz", hash = "sha256:62642358adc77ffa87233bc4d2354c4b2682d214048f500964dbe760ccedf102"}, ] [package.extras] docs = ["furo", "jaraco.packaging (>=9)", "jaraco.tidelift (>=1.4)", "pygments-github-lexers (==0.0.5)", "rst.linker (>=1.9)", "sphinx (>=3.5)", "sphinx-favicon", "sphinx-hoverxref (<2)", "sphinx-inline-tabs", "sphinx-lint", "sphinx-notfound-page (==0.8.3)", "sphinx-reredirects", "sphinxcontrib-towncrier"] -testing = ["build[virtualenv]", "filelock (>=3.4.0)", "flake8 (<5)", "flake8-2020", "ini2toml[lite] (>=0.9)", "jaraco.envs (>=2.2)", "jaraco.path (>=3.2.0)", "pip (>=19.1)", "pip-run (>=8.8)", "pytest (>=6)", "pytest-black (>=0.3.7)", "pytest-checkdocs (>=2.4)", "pytest-cov", "pytest-enabler (>=1.3)", "pytest-flake8", "pytest-mypy (>=0.9.1)", "pytest-perf", "pytest-timeout", "pytest-xdist", "tomli-w (>=1.0.0)", "virtualenv (>=13.0.0)", "wheel"] +testing = ["build[virtualenv]", "filelock (>=3.4.0)", "flake8-2020", "ini2toml[lite] (>=0.9)", "jaraco.envs (>=2.2)", "jaraco.path (>=3.2.0)", "pip (>=19.1)", "pip-run (>=8.8)", "pytest (>=6)", "pytest-black (>=0.3.7)", "pytest-checkdocs (>=2.4)", "pytest-cov", "pytest-enabler (>=1.3)", "pytest-mypy (>=0.9.1)", "pytest-perf", "pytest-ruff", "pytest-timeout", "pytest-xdist", "tomli-w (>=1.0.0)", "virtualenv (>=13.0.0)", "wheel"] testing-integration = ["build[virtualenv]", "filelock (>=3.4.0)", "jaraco.envs (>=2.2)", "jaraco.path (>=3.2.0)", "pytest", "pytest-enabler", "pytest-xdist", "tomli", "virtualenv (>=13.0.0)", "wheel"] [[package]] @@ -1898,26 +1898,26 @@ files = [ [[package]] name = "types-python-dateutil" -version = "2.8.19.12" +version = "2.8.19.13" description = "Typing stubs for python-dateutil" category = "dev" optional = false python-versions = "*" files = [ - {file = "types-python-dateutil-2.8.19.12.tar.gz", hash = "sha256:355b2cb82b31e556fd18e7b074de7c350c680ab80608f0cc55ba6770d986d67d"}, - {file = "types_python_dateutil-2.8.19.12-py3-none-any.whl", hash = "sha256:fe5b545e678ec13e3ddc83a0eee1545c1b5e2fba4cfc39b276ab6f4e7604a923"}, + {file = "types-python-dateutil-2.8.19.13.tar.gz", hash = "sha256:09a0275f95ee31ce68196710ed2c3d1b9dc42e0b61cc43acc369a42cb939134f"}, + {file = "types_python_dateutil-2.8.19.13-py3-none-any.whl", hash = "sha256:0b0e7c68e7043b0354b26a1e0225cb1baea7abb1b324d02b50e2d08f1221043f"}, ] [[package]] name = "types-requests" -version = "2.30.0.0" +version = "2.31.0.1" description = "Typing stubs for requests" category = "dev" optional = false python-versions = "*" files = [ - {file = "types-requests-2.30.0.0.tar.gz", hash = "sha256:dec781054324a70ba64430ae9e62e7e9c8e4618c185a5cb3f87a6738251b5a31"}, - {file = "types_requests-2.30.0.0-py3-none-any.whl", hash = "sha256:c6cf08e120ca9f0dc4fa4e32c3f953c3fba222bcc1db6b97695bce8da1ba9864"}, + {file = "types-requests-2.31.0.1.tar.gz", hash = "sha256:3de667cffa123ce698591de0ad7db034a5317457a596eb0b4944e5a9d9e8d1ac"}, + {file = "types_requests-2.31.0.1-py3-none-any.whl", hash = "sha256:afb06ef8f25ba83d59a1d424bd7a5a939082f94b94e90ab5e6116bd2559deaa3"}, ] [package.dependencies] @@ -1925,26 +1925,26 @@ types-urllib3 = "*" [[package]] name = "types-urllib3" -version = "1.26.25.12" +version = "1.26.25.13" description = "Typing stubs for urllib3" category = "dev" optional = false python-versions = "*" files = [ - {file = "types-urllib3-1.26.25.12.tar.gz", hash = "sha256:a1557355ce8d350a555d142589f3001903757d2d36c18a66f588d9659bbc917d"}, - {file = "types_urllib3-1.26.25.12-py3-none-any.whl", hash = "sha256:3ba3d3a8ee46e0d5512c6bd0594da4f10b2584b47a470f8422044a2ab462f1df"}, + {file = "types-urllib3-1.26.25.13.tar.gz", hash = "sha256:3300538c9dc11dad32eae4827ac313f5d986b8b21494801f1bf97a1ac6c03ae5"}, + {file = "types_urllib3-1.26.25.13-py3-none-any.whl", hash = "sha256:5dbd1d2bef14efee43f5318b5d36d805a489f6600252bb53626d4bfafd95e27c"}, ] [[package]] name = "typing-extensions" -version = "4.5.0" +version = "4.6.3" description = "Backported and Experimental Type Hints for Python 3.7+" category = "main" optional = false python-versions = ">=3.7" files = [ - {file = "typing_extensions-4.5.0-py3-none-any.whl", hash = "sha256:fb33085c39dd998ac16d1431ebc293a8b3eedd00fd4a32de0ff79002c19511b4"}, - {file = "typing_extensions-4.5.0.tar.gz", hash = "sha256:5cb5f4a79139d699607b3ef622a1dedafa84e115ab0024e0d9c044a9479ca7cb"}, + {file = "typing_extensions-4.6.3-py3-none-any.whl", hash = "sha256:88a4153d8505aabbb4e13aacb7c486c2b4a33ca3b3f807914a9b4c844c471c26"}, + {file = "typing_extensions-4.6.3.tar.gz", hash = "sha256:d91d5919357fe7f681a9f2b5b4cb2a5f1ef0a1e9f59c4d8ff0d3491e05c0ffd5"}, ] [[package]] @@ -2024,14 +2024,14 @@ files = [ [[package]] name = "urllib3" -version = "1.26.15" +version = "1.26.16" description = "HTTP library with thread-safe connection pooling, file post, and more." category = "main" optional = false python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*, !=3.5.*" files = [ - {file = "urllib3-1.26.15-py2.py3-none-any.whl", hash = "sha256:aa751d169e23c7479ce47a0cb0da579e3ede798f994f5816a74e4f4500dcea42"}, - {file = "urllib3-1.26.15.tar.gz", hash = "sha256:8a388717b9476f934a21484e8c8e61875ab60644d29b9b39e11e4b9dc1c6b305"}, + {file = "urllib3-1.26.16-py2.py3-none-any.whl", hash = "sha256:8d36afa7616d8ab714608411b4a3b13e58f463aee519024578e062e141dce20f"}, + {file = "urllib3-1.26.16.tar.gz", hash = "sha256:8f135f6502756bde6b2a9b28989df5fbe87c9970cecaa69041edcce7f0589b14"}, ] [package.extras] diff --git a/pyproject.toml b/pyproject.toml index 01fd5012..51b85d68 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,7 +1,7 @@ [tool.poetry] -name = "wcdimportbot" -version = "3.0.0-alpha1" -description = "Import and update architecture for the WikiCitations Database in Wikibase.cloud" +name = "Internet Archive Reference Inventory (IARI)" +version = "4.1.0" +description = "API capable of fetching, extracting, transforming and storing reference information from Wikipedia articles, websites and PDFs as structured data." authors = ["Dennis Priskorn <68460690+dpriskorn@users.noreply.github.com>"] license = "GPLv3+" readme = "README.md" diff --git a/src/models/api/handlers/pdf.py b/src/models/api/handlers/pdf.py index 028d3647..7fc1e1fa 100644 --- a/src/models/api/handlers/pdf.py +++ b/src/models/api/handlers/pdf.py @@ -37,7 +37,6 @@ class PdfHandler(BaseHandler): text_pages_without_spaces: Dict[int, str] = {} url_annotations: Dict[int, List[Any]] = {} error_details: Tuple[int, str] = (0, "") - # urls_fixed: List[str] = [] file_path: str = "" pdf_document: Optional[Document] = None word_counts: List[int] = [] @@ -313,23 +312,6 @@ def get_dict(self): # exit() return data - # def __get_cleaned_page_string__(self, number) -> str: - # page_string = self.text_pages[number] - # page_string = self.__clean_linebreaks__(string=page_string) - # return page_string - - # def __fix_doi_typing_errors__(self, string): - # """This fixes common typing errors that we found""" - # # From https://s3.documentcloud.org/documents/23782225/mwg-fdr-document-04-16-23-1.pdf page 298 - # if "https://doi.org:" in string: - # self.urls_fixed.append("https://doi.org:") - # string = string.replace("https://doi.org:", "https://doi.org/") - # # From https://s3.documentcloud.org/documents/23782225/mwg-fdr-document-04-16-23-1.pdf page 298 - # if "https://doi.or/" in string: - # self.urls_fixed.append("https://doi.or/") - # string = string.replace("https://doi.or/", "https://doi.org/") - # return string - def __read_pdf_from_file__(self): """This is needed for fast testing on pdfs in test_data""" with open(self.file_path, "rb") as file: diff --git a/tests/checking/test_url.py b/tests/checking/test_url.py index c20767e2..222545c4 100644 --- a/tests/checking/test_url.py +++ b/tests/checking/test_url.py @@ -64,7 +64,7 @@ def test_check_bad_long_tld(self): # assert url.response_headers == {} def test_check_403(self): - url = Url(url=self.forbidden_url_if_not_spoofed_headers, timeout=2) + url = Url(url=self.forbidden_url_if_not_spoofed_headers, timeout=5) url.check() assert url.status_code == 200 assert url.dns_error is False