Skip to content
Snippets Groups Projects
Commit b9e9c3c8 authored by Donnawetter's avatar Donnawetter Committed by Malte Schokolowski
Browse files

fix comments and reformat files from verarbeitung

parent 2cb5ff1e
No related branches found
No related tags found
1 merge request!32verarbeitung: final merge to main
Showing
with 1022 additions and 898 deletions
# Projekt CiS-Projekt 2021/22 # Projekt CiS-Projekt 2021/22
Processing-Package to generate theoretical graph for citations and references of given input publications. Processing-Package to generate a theoretical graph for citations and references of given input publications.
## Usage/Examples ## Usage/Examples
...@@ -13,38 +13,35 @@ def main(url_list): ...@@ -13,38 +13,35 @@ def main(url_list):
``` ```
Grundlegender Prozess: Grundlegender Prozess:
Es wird von der UI eine Liste an DOIs an die Verarbeitung übergeben und Es wird von der UI eine Liste an DOIs an die Verarbeitung übergeben und diese wird dann umgewandelt in eine Knoten-und
diese wird dann umgewandelt in eine Knoten-und Kantenmenge, welche die Zitierungen darstellen. Kantenmenge, welche die Zitierungen darstellen. Die Informationen über die Paper und die Zitierungen kommen von der
Die Informationen über die Paper und die Zitierungen kommen von der Input Gruppe über den Aufruf Input Gruppe über den Aufruf von der Funktion Publication. Die Knoten- und Kantenmengen werden in Form einer Json Datei
von der Funktion Publication. Die Knoten- und Kantenmengen werden in Form einer an den Output übergeben.
Json Datei an den Output übergeben.
## Files and functions in directory ## Files and functions in directory
get_pub_from_input.py: get_pub_from_input.py:
```python ```python
def get_pub(pub_doi, test_var) def get_pub(pub_doi, test_var)
``` ```
- Gibt für eine DOI ein Klassenobjekt zurück, in dem alle nötigen Informationen gespeichert sind.
- Gibt für eine DOI ein Klassenobjekt zurück, in dem alle nötigen Informationen gespeichert sind.
process_main.py: process_main.py:
```python ```python
def Processing(url_list) def Processing(url_list)
``` ```
- Überprüft, ob bereits eine Json Datei existiert und ruft dann entweder die Funktion auf, um
einen neuen Graphen zu erstellen oder die Funktion um einen Vorhandenen zu updaten.
- Überprüft, ob bereits eine Json Datei existiert und ruft dann entweder die Funktion auf, um einen neuen Graphen zu
erstellen oder die Funktion, um einen Vorhandenen zu updaten.
start.script.py: start.script.py:
- Wird benötigt, um die Dateien ordnerübergreifend aufzurufen. Nur fürs interne Testen der - Wird benötigt, um die Dateien ordnerübergreifend aufzurufen. Nur fürs interne Testen der
Funktionalität Funktionalität
<name>.json: <name>.json:
- sind momentan Beispiele, die an den Output übergeben werden könnten. - sind momentan Beispiele, die an den Output übergeben werden könnten.
...@@ -54,8 +51,8 @@ start.script.py: ...@@ -54,8 +51,8 @@ start.script.py:
python -m unittest discover verarbeitung/test -v python -m unittest discover verarbeitung/test -v
## Authors ## Authors
- Donna Löding - Donna Löding
- Alina Molkentin - Alina Molkentin
- Xinyi Tang
- Judith Große - Judith Große
- Malte Schokolowski - Malte Schokolowski
\ No newline at end of file
...@@ -24,6 +24,5 @@ export_to_json.py ...@@ -24,6 +24,5 @@ export_to_json.py
## Authors ## Authors
- Donna Löding - Donna Löding
- Alina Molkentin - Alina Molkentin
- Xinyi Tang
- Judith Große - Judith Große
- Malte Schokolowski - Malte Schokolowski
\ No newline at end of file
# -*- coding: utf-8 -*- # -*- coding: utf-8 -*-
""" """
Functions to add citations recursivly for multiple ACS/Nature journals Functions to add citations recursively for multiple ACS/Nature journals
""" """
__authors__ = "Donna Löding, Alina Molkentin, Xinyi Tang, Judith Große, Malte Schokolowski" __authors__ = "Donna Löding, Alina Molkentin, Judith Große, Malte Schokolowski"
__email__ = "cis-project2021@zbh.uni-hamburg.de" __email__ = "cis-project2021@zbh.uni-hamburg.de"
__status__ = "Production" __status__ = "Production"
# __copyright__ = "" # __copyright__ = ""
# __credits__ = ["", "", "", ""] # __credits__ = ["", "", "", ""]
# __license__ = "" # __license__ = ""
...@@ -17,12 +18,15 @@ __status__ = "Production" ...@@ -17,12 +18,15 @@ __status__ = "Production"
import sys import sys
from pathlib import Path from pathlib import Path
from os import error from os import error
sys.path.append("../") sys.path.append("../")
from input.publication import Publication from input.publication import Publication
from verarbeitung.get_pub_from_input import get_pub from verarbeitung.get_pub_from_input import get_pub
def create_graph_structure_citations_test(pub, search_depth, search_depth_max, cit_type, test_var, test_nodes, test_edges):
def create_graph_structure_citations_test(pub, search_depth, search_depth_max, cit_type, test_var, test_nodes,
test_edges):
''' '''
:param test_nodes: list of publications from unit test :param test_nodes: list of publications from unit test
:type test_nodes: List[Publication] :type test_nodes: List[Publication]
...@@ -43,7 +47,7 @@ def get_cit_type_list(pub, cit_type): ...@@ -43,7 +47,7 @@ def get_cit_type_list(pub, cit_type):
:param pub: Publication which citations will be added :param pub: Publication which citations will be added
:type pub: Publication :type pub: Publication
:param cit_type: variable to differenciate citation and reference call :param cit_type: variable to differentiate citation and reference call
:type cit_type: String :type cit_type: String
function to return citation or reference list for given pub function to return citation or reference list for given pub
...@@ -55,6 +59,7 @@ def get_cit_type_list(pub, cit_type): ...@@ -55,6 +59,7 @@ def get_cit_type_list(pub, cit_type):
else: else:
return (ValueError) return (ValueError)
def create_global_lists_cit(input_nodes, input_edges, pub, search_depth, search_depth_max, cit_type, test_var): def create_global_lists_cit(input_nodes, input_edges, pub, search_depth, search_depth_max, cit_type, test_var):
''' '''
:param input_nodes: list of nodes from Processing :param input_nodes: list of nodes from Processing
...@@ -72,10 +77,10 @@ def create_global_lists_cit(input_nodes, input_edges, pub, search_depth, search_ ...@@ -72,10 +77,10 @@ def create_global_lists_cit(input_nodes, input_edges, pub, search_depth, search_
:param search_depth_max: maximum depth to search for citations :param search_depth_max: maximum depth to search for citations
:type search_depth_max: int :type search_depth_max: int
:param cit_type: variable to differenciate citation and reference call :param cit_type: variable to differentiate citation and reference call
:type cit_type: String :type cit_type: String
:param test_var: variable to differenciate between test and url call :param test_var: variable to differentiate between test and url call
:type test_var: boolean :type test_var: boolean
function to create nodes and edges and call create_graph_structure_citations function to create nodes and edges and call create_graph_structure_citations
...@@ -99,10 +104,10 @@ def create_graph_structure_citations(pub, search_depth, search_depth_max, cit_ty ...@@ -99,10 +104,10 @@ def create_graph_structure_citations(pub, search_depth, search_depth_max, cit_ty
:param search_depth_max: maximum depth to search for citations :param search_depth_max: maximum depth to search for citations
:type search_depth_max: int :type search_depth_max: int
:param cit_type: variable to differenciate citation and reference call :param cit_type: variable to differentiate citation and reference call
:type cit_type: String :type cit_type: String
:param test_var: variable to differenciate between test and url call :param test_var: variable to differentiate between test and url call
:type test_var: boolean :type test_var: boolean
adds a node for every citing publication unknown adds a node for every citing publication unknown
...@@ -154,10 +159,10 @@ def process_citations_rec(citations_pub_obj_list, search_depth, search_depth_max ...@@ -154,10 +159,10 @@ def process_citations_rec(citations_pub_obj_list, search_depth, search_depth_max
:param search_depth_max: maximum depth to search for citations :param search_depth_max: maximum depth to search for citations
:type search_depth_max: int :type search_depth_max: int
:param cit_type: variable to differenciate citation and reference call :param cit_type: variable to differentiate citation and reference call
:type cit_type: String :type cit_type: String
:param test_var: variable to differenciate between test and url call :param test_var: variable to differentiate between test and url call
:type test_var: boolean :type test_var: boolean
recursive function to implement depth-first-search on citations recursive function to implement depth-first-search on citations
...@@ -167,11 +172,12 @@ def process_citations_rec(citations_pub_obj_list, search_depth, search_depth_max ...@@ -167,11 +172,12 @@ def process_citations_rec(citations_pub_obj_list, search_depth, search_depth_max
new_citation_pub_obj_save_list = [] new_citation_pub_obj_save_list = []
for pub in citations_pub_obj_list: for pub in citations_pub_obj_list:
new_citation_pub_obj_list = create_graph_structure_citations(pub, search_depth, search_depth_max, cit_type, test_var) new_citation_pub_obj_list = create_graph_structure_citations(pub, search_depth, search_depth_max, cit_type,
test_var)
if len(new_citation_pub_obj_list) > 0: if len(new_citation_pub_obj_list) > 0:
new_citation_pub_obj_save_list += new_citation_pub_obj_list new_citation_pub_obj_save_list += new_citation_pub_obj_list
# If the maximum depth has not yet been reached, calls function recursivly with increased depth # If the maximum depth has not yet been reached, calls function recursively with increased depth
if (search_depth < search_depth_max): if (search_depth < search_depth_max):
process_citations_rec(new_citation_pub_obj_save_list, search_depth + 1, search_depth_max, cit_type, test_var) process_citations_rec(new_citation_pub_obj_save_list, search_depth + 1, search_depth_max, cit_type, test_var)
...@@ -193,10 +199,10 @@ def add_citations(input_nodes, input_edges, citations_pub_obj_list, search_depth ...@@ -193,10 +199,10 @@ def add_citations(input_nodes, input_edges, citations_pub_obj_list, search_depth
:param search_depth_max: maximum depth to search for citations :param search_depth_max: maximum depth to search for citations
:type search_depth_max: int :type search_depth_max: int
:param cit_type: variable to differenciate citation and reference call :param cit_type: variable to differentiate citation and reference call
:type cit_type: String :type cit_type: String
:param test_var: variable to differenciate between test and url call :param test_var: variable to differentiate between test and url call
:type test_var: boolean :type test_var: boolean
function to call recursive depth-first-search of citations function to call recursive depth-first-search of citations
...@@ -206,4 +212,3 @@ def add_citations(input_nodes, input_edges, citations_pub_obj_list, search_depth ...@@ -206,4 +212,3 @@ def add_citations(input_nodes, input_edges, citations_pub_obj_list, search_depth
edges = input_edges edges = input_edges
process_citations_rec(citations_pub_obj_list, search_depth, search_depth_max, cit_type, test_var) process_citations_rec(citations_pub_obj_list, search_depth, search_depth_max, cit_type, test_var)
#return(nodes, edges)
\ No newline at end of file
...@@ -4,9 +4,10 @@ Functions that format the computed graph to match the interface to the output-pa ...@@ -4,9 +4,10 @@ Functions that format the computed graph to match the interface to the output-pa
""" """
__authors__ = "Donna Löding, Alina Molkentin, Xinyi Tang, Judith Große, Malte Schokolowski" __authors__ = "Donna Löding, Alina Molkentin, Judith Große, Malte Schokolowski"
__email__ = "cis-project2021@zbh.uni-hamburg.de" __email__ = "cis-project2021@zbh.uni-hamburg.de"
__status__ = "Production" __status__ = "Production"
# __copyright__ = "" # __copyright__ = ""
# __credits__ = ["", "", "", ""] # __credits__ = ["", "", "", ""]
# __license__ = "" # __license__ = ""
...@@ -43,7 +44,8 @@ def format_nodes(nodes): ...@@ -43,7 +44,8 @@ def format_nodes(nodes):
list_of_node_dicts.append(new_dict) list_of_node_dicts.append(new_dict)
return list_of_node_dicts return list_of_node_dicts
# creates a list that contains a disctionary for each edge
# creates a list that contains a dictionary for each edge
# the dictionaries contain the source as keys and the target as values # the dictionaries contain the source as keys and the target as values
def format_edges(edges): def format_edges(edges):
''' '''
...@@ -69,7 +71,7 @@ def output_to_json(nodes, edges, search_depth, search_height, json_file = 'json_ ...@@ -69,7 +71,7 @@ def output_to_json(nodes, edges, search_depth, search_height, json_file = 'json_
:param edges: list of links to export to json :param edges: list of links to export to json
:type edges: List[String,String] :type edges: List[String,String]
:param test_var: variable to differenciate between test and url call :param test_var: variable to differentiate between test and url call
:type test_var: boolean :type test_var: boolean
function to export nodes and links as a dictionary to json file function to export nodes and links as a dictionary to json file
......
...@@ -4,9 +4,10 @@ Functions to generate a graph representing citations between multiple ACS/Nature ...@@ -4,9 +4,10 @@ Functions to generate a graph representing citations between multiple ACS/Nature
""" """
__authors__ = "Donna Löding, Alina Molkentin, Xinyi Tang, Judith Große, Malte Schokolowski" __authors__ = "Donna Löding, Alina Molkentin, Judith Große, Malte Schokolowski"
__email__ = "cis-project2021@zbh.uni-hamburg.de" __email__ = "cis-project2021@zbh.uni-hamburg.de"
__status__ = "Production" __status__ = "Production"
# __copyright__ = "" # __copyright__ = ""
# __credits__ = ["", "", "", ""] # __credits__ = ["", "", "", ""]
# __license__ = "" # __license__ = ""
...@@ -18,6 +19,7 @@ import sys ...@@ -18,6 +19,7 @@ import sys
import gc import gc
from pathlib import Path from pathlib import Path
from os import error from os import error
sys.path.append("../") sys.path.append("../")
from input.publication import Publication from input.publication import Publication
...@@ -25,6 +27,7 @@ from verarbeitung.get_pub_from_input import get_pub ...@@ -25,6 +27,7 @@ from verarbeitung.get_pub_from_input import get_pub
from .export_to_json import output_to_json from .export_to_json import output_to_json
from .add_citations_rec import add_citations, create_global_lists_cit from .add_citations_rec import add_citations, create_global_lists_cit
def initialize_nodes_list_test(doi_input_list, search_depth_max, search_height_max, test_var): def initialize_nodes_list_test(doi_input_list, search_depth_max, search_height_max, test_var):
''' '''
for unit test purposes only for unit test purposes only
...@@ -34,6 +37,7 @@ def initialize_nodes_list_test(doi_input_list, search_depth_max, search_height_m ...@@ -34,6 +37,7 @@ def initialize_nodes_list_test(doi_input_list, search_depth_max, search_height_m
edges = [] edges = []
return (initialize_nodes_list(doi_input_list, search_depth_max, search_height_max, test_var)) return (initialize_nodes_list(doi_input_list, search_depth_max, search_height_max, test_var))
def complete_inner_edges_test(test_nodes, test_edges): def complete_inner_edges_test(test_nodes, test_edges):
''' '''
:param test_nodes: list of publications from unit test :param test_nodes: list of publications from unit test
...@@ -51,6 +55,7 @@ def complete_inner_edges_test(test_nodes, test_edges): ...@@ -51,6 +55,7 @@ def complete_inner_edges_test(test_nodes, test_edges):
complete_inner_edges() complete_inner_edges()
return (nodes, edges) return (nodes, edges)
def initialize_nodes_list(doi_input_list, search_depth_max, search_height_max, test_var): def initialize_nodes_list(doi_input_list, search_depth_max, search_height_max, test_var):
''' '''
:param doi_input_list: input list of doi from UI :param doi_input_list: input list of doi from UI
...@@ -62,10 +67,10 @@ def initialize_nodes_list(doi_input_list, search_depth_max, search_height_max, t ...@@ -62,10 +67,10 @@ def initialize_nodes_list(doi_input_list, search_depth_max, search_height_max, t
:param search_height_max: maximum height to search for citations :param search_height_max: maximum height to search for citations
:type search_height_max: int :type search_height_max: int
:param test_var: variable to differenciate between test and url call :param test_var: variable to differentiate between test and url call
:type test_var: boolean :type test_var: boolean
adds input dois to nodes and retrieves citations and references for input publications adds input DOIs to nodes and retrieves citations and references for input publications
''' '''
# saves found citations and references in lists # saves found citations and references in lists
...@@ -83,15 +88,15 @@ def initialize_nodes_list(doi_input_list, search_depth_max, search_height_max, t ...@@ -83,15 +88,15 @@ def initialize_nodes_list(doi_input_list, search_depth_max, search_height_max, t
not_in_nodes = True # boolean value to check if a node already exists in the set of nodes not_in_nodes = True # boolean value to check if a node already exists in the set of nodes
for node in nodes: # iterates over every node in the set of nodes for node in nodes: # iterates over every node in the set of nodes
if (pub.doi_url == node.doi_url): #determines that a node with this doi already is in the set if (pub.doi_url == node.doi_url): # determines that a node with this DOI already is in the set
not_in_nodes = False # false --> node will not be created not_in_nodes = False # false --> node will not be created
node.group = 0 node.group = 0
break break
if (not_in_nodes): #there is no node with this doi in the set if (not_in_nodes): # there is no node with this DOI in the set
nodes.append(pub) # appends Publication Object nodes.append(pub) # appends Publication Object
pub.group = 0 pub.group = 0
else: else:
doi_input_list.remove(pub_doi) #deletes the doi-dublicate from input list doi_input_list.remove(pub_doi) # deletes the DOI-duplicate from input list
# inserts references as publication objects into list and # inserts references as publication objects into list and
# inserts first depth references into nodes/edges if maximum search depth > 0 # inserts first depth references into nodes/edges if maximum search depth > 0
...@@ -106,7 +111,6 @@ def initialize_nodes_list(doi_input_list, search_depth_max, search_height_max, t ...@@ -106,7 +111,6 @@ def initialize_nodes_list(doi_input_list, search_depth_max, search_height_max, t
return (references_pub_obj_list, citations_pub_obj_list) return (references_pub_obj_list, citations_pub_obj_list)
def complete_inner_edges(update_var=False, input_nodes=[], input_edges=[]): def complete_inner_edges(update_var=False, input_nodes=[], input_edges=[]):
''' '''
:param update_var: variable to check if call is from update_graph with known nodes and edges or fresh construction :param update_var: variable to check if call is from update_graph with known nodes and edges or fresh construction
...@@ -139,7 +143,8 @@ def complete_inner_edges(update_var = False, input_nodes = [], input_edges = []) ...@@ -139,7 +143,8 @@ def complete_inner_edges(update_var = False, input_nodes = [], input_edges = [])
edges.append([node.doi_url, reference.doi_url]) edges.append([node.doi_url, reference.doi_url])
def init_graph_construction(doi_input_list, search_depth, search_height, test_var = False, update_var = False, input_nodes = [], input_edges = []): def init_graph_construction(doi_input_list, search_depth, search_height, test_var=False, update_var=False,
input_nodes=[], input_edges=[]):
''' '''
:param doi_input_list: input list of doi from UI :param doi_input_list: input list of doi from UI
:type doi_input_list: List[String] :type doi_input_list: List[String]
...@@ -150,7 +155,7 @@ def init_graph_construction(doi_input_list, search_depth, search_height, test_va ...@@ -150,7 +155,7 @@ def init_graph_construction(doi_input_list, search_depth, search_height, test_va
:param search_depth: maximum depth to search for references :param search_depth: maximum depth to search for references
:type search_depth: int :type search_depth: int
:param test_var: variable to differenciate between test and url call :param test_var: variable to differentiate between test and url call
:type test_var: boolean :type test_var: boolean
:param update_var: variable to check if call is from update_graph with known nodes and edges or fresh construction :param update_var: variable to check if call is from update_graph with known nodes and edges or fresh construction
...@@ -177,7 +182,6 @@ def init_graph_construction(doi_input_list, search_depth, search_height, test_va ...@@ -177,7 +182,6 @@ def init_graph_construction(doi_input_list, search_depth, search_height, test_va
if (search_depth < 0): if (search_depth < 0):
print("Error, search_depth of search must be positive") print("Error, search_depth of search must be positive")
# creates empty lists to save nodes and edges # creates empty lists to save nodes and edges
global nodes, edges, error_doi_list global nodes, edges, error_doi_list
if update_var: if update_var:
...@@ -189,7 +193,8 @@ def init_graph_construction(doi_input_list, search_depth, search_height, test_va ...@@ -189,7 +193,8 @@ def init_graph_construction(doi_input_list, search_depth, search_height, test_va
error_doi_list = [] error_doi_list = []
# initializes nodes/edges from input and gets a list with publication objects for citations and references returned # initializes nodes/edges from input and gets a list with publication objects for citations and references returned
references_obj_list, citations_obj_list = initialize_nodes_list(doi_input_list,search_depth, search_height, test_var) references_obj_list, citations_obj_list = initialize_nodes_list(doi_input_list, search_depth, search_height,
test_var)
# function calls to begin recursive processing up to max depth/height # function calls to begin recursive processing up to max depth/height
add_citations(nodes, edges, citations_obj_list, 1, search_height, "Citation", test_var) add_citations(nodes, edges, citations_obj_list, 1, search_height, "Citation", test_var)
...@@ -205,5 +210,4 @@ def init_graph_construction(doi_input_list, search_depth, search_height, test_va ...@@ -205,5 +210,4 @@ def init_graph_construction(doi_input_list, search_depth, search_height, test_va
del edges del edges
gc.collect() gc.collect()
return (new_nodes, new_edges, error_doi_list) return (new_nodes, new_edges, error_doi_list)
Dieser Ordner ist nur für uns intern, um Testläufe mir echten DOIs zu starten. Dieser Ordner ist nur für uns intern, um Testläufe mit echten DOIs zu starten.
\ No newline at end of file \ No newline at end of file
...@@ -4,9 +4,10 @@ Functions to test and print the nodes and edges sets ...@@ -4,9 +4,10 @@ Functions to test and print the nodes and edges sets
""" """
__authors__ = "Donna Löding, Alina Molkentin, Xinyi Tang, Judith Große, Malte Schokolowski" __authors__ = "Donna Löding, Alina Molkentin, Judith Große, Malte Schokolowski"
__email__ = "cis-project2021@zbh.uni-hamburg.de" __email__ = "cis-project2021@zbh.uni-hamburg.de"
__status__ = "Production" __status__ = "Production"
# __copyright__ = "" # __copyright__ = ""
# __credits__ = ["", "", "", ""] # __credits__ = ["", "", "", ""]
# __license__ = "" # __license__ = ""
...@@ -22,6 +23,7 @@ from verarbeitung.construct_new_graph.initialize_graph import init_graph_constru ...@@ -22,6 +23,7 @@ from verarbeitung.construct_new_graph.initialize_graph import init_graph_constru
from verarbeitung.update_graph.import_from_json import input_from_json from verarbeitung.update_graph.import_from_json import input_from_json
from verarbeitung.update_graph.update_graph import update_graph from verarbeitung.update_graph.update_graph import update_graph
# a function to print nodes and edges from a graph # a function to print nodes and edges from a graph
def print_graph(nodes, edges): def print_graph(nodes, edges):
print("Knoten:\n") print("Knoten:\n")
...@@ -34,6 +36,7 @@ def print_graph(nodes, edges): ...@@ -34,6 +36,7 @@ def print_graph(nodes, edges):
print(len(edges)) print(len(edges))
print(" ") print(" ")
def print_extended_graph(nodes, edges): def print_extended_graph(nodes, edges):
print("Knoten:\n") print("Knoten:\n")
for node in nodes: for node in nodes:
...@@ -50,6 +53,7 @@ def print_extended_graph(nodes, edges): ...@@ -50,6 +53,7 @@ def print_extended_graph(nodes, edges):
print(len(edges)) print(len(edges))
print(" ") print(" ")
def print_simple(nodes, edges): def print_simple(nodes, edges):
# for node in nodes: # for node in nodes:
# print(node) # print(node)
...@@ -59,7 +63,8 @@ def print_simple(nodes, edges): ...@@ -59,7 +63,8 @@ def print_simple(nodes, edges):
print(len(edges)) print(len(edges))
print(" ") print(" ")
# program test with some random dois
# program test with some random DOIs
def try_known_publications(): def try_known_publications():
doi_list = [] doi_list = []
doi_list.append('https://pubs.acs.org/doi/10.1021/acs.jcim.9b00249') doi_list.append('https://pubs.acs.org/doi/10.1021/acs.jcim.9b00249')
...@@ -76,13 +81,13 @@ def try_known_publications(): ...@@ -76,13 +81,13 @@ def try_known_publications():
# url = sys.argv[1] # url = sys.argv[1]
# arr.append[url] # arr.append[url]
nodes, edges = init_graph_construction(doi_list, 2, 2) nodes, edges = init_graph_construction(doi_list, 2, 2)
print_graph(nodes, edges) print_graph(nodes, edges)
return (nodes, edges) return (nodes, edges)
def try_delete_nodes(): def try_delete_nodes():
doi_list = [] doi_list = []
doi_list.append('https://pubs.acs.org/doi/10.1021/acs.jcim.9b00249') doi_list.append('https://pubs.acs.org/doi/10.1021/acs.jcim.9b00249')
...@@ -96,10 +101,12 @@ def try_delete_nodes(): ...@@ -96,10 +101,12 @@ def try_delete_nodes():
# valid_nodes, valid_edges = update_graph(doi_list, list_of_nodes_py, list_of_edges_py) # valid_nodes, valid_edges = update_graph(doi_list, list_of_nodes_py, list_of_edges_py)
# print_simple(valid_nodes, valid_edges) # print_simple(valid_nodes, valid_edges)
def try_import(): def try_import():
nodes, edges = input_from_json('json_text.json') nodes, edges = input_from_json('json_text.json')
print_extended_graph(nodes, edges) print_extended_graph(nodes, edges)
# nodes, edges = try_known_publications() # nodes, edges = try_known_publications()
# nodes_new, edges_new = input_from_json("json_text.json") # nodes_new, edges_new = input_from_json("json_text.json")
# print_graph(nodes_new, edges_new) # print_graph(nodes_new, edges_new)
......
# -*- coding: utf-8 -*- # -*- coding: utf-8 -*-
""" """
A function to return an object of Type Publication for a given doi A function to return an object of Type Publication for a given DOI
""" """
__authors__ = "Donna Löding, Alina Molkentin, Xinyi Tang, Judith Große, Malte Schokolowski" __authors__ = "Donna Löding, Alina Molkentin, Judith Große, Malte Schokolowski"
__email__ = "cis-project2021@zbh.uni-hamburg.de" __email__ = "cis-project2021@zbh.uni-hamburg.de"
__status__ = "Production" __status__ = "Production"
#__copyright__ = "" #__copyright__ = ""
...@@ -27,16 +27,16 @@ def get_pub(pub_doi, test_var): ...@@ -27,16 +27,16 @@ def get_pub(pub_doi, test_var):
:param pub_doi: input doi to get Publication object for :param pub_doi: input doi to get Publication object for
:type pub_doi: String :type pub_doi: String
:param test_var: variable to differenciate between test and url call :param test_var: variable to differentiate between test and url call
:type test_var: boolean :type test_var: boolean
function to return an object of type Publication for given input doi depending on whether its a test or url doi function to return an object of type Publication for given input doi depending on whether its a test or url DOI
''' '''
#checks if it's a test and chooses appropiate function # checks if it's a test and chooses appropriate function
if(test_var): if(test_var):
pub = input_test_func(pub_doi) pub = input_test_func(pub_doi)
#checks that it isnt a test and chooses standart-input function # checks that it isn't a test and chooses standard-input function
else: else:
inter = Input() inter = Input()
try: try:
......
...@@ -4,7 +4,7 @@ main function to call to generate a graph representing citations between multipl ...@@ -4,7 +4,7 @@ main function to call to generate a graph representing citations between multipl
""" """
__authors__ = "Donna Löding, Alina Molkentin, Xinyi Tang, Judith Große, Malte Schokolowski" __authors__ = "Donna Löding, Alina Molkentin, Judith Große, Malte Schokolowski"
__email__ = "cis-project2021@zbh.uni-hamburg.de" __email__ = "cis-project2021@zbh.uni-hamburg.de"
__status__ = "Production" __status__ = "Production"
#__copyright__ = "" #__copyright__ = ""
...@@ -24,6 +24,7 @@ from verarbeitung.construct_new_graph.export_to_json import output_to_json ...@@ -24,6 +24,7 @@ from verarbeitung.construct_new_graph.export_to_json import output_to_json
from verarbeitung.construct_new_graph.initialize_graph import init_graph_construction from verarbeitung.construct_new_graph.initialize_graph import init_graph_construction
from verarbeitung.update_graph.update_graph import update_graph from verarbeitung.update_graph.update_graph import update_graph
def Processing(url_list, search_depth, search_height, json_file = 'json_text.json'): def Processing(url_list, search_depth, search_height, json_file = 'json_text.json'):
''' '''
:param url_list: list of urls to construct publication graph for :param url_list: list of urls to construct publication graph for
...@@ -53,4 +54,3 @@ def Processing(url_list, search_depth, search_height, json_file = 'json_text.jso ...@@ -53,4 +54,3 @@ def Processing(url_list, search_depth, search_height, json_file = 'json_text.jso
output_to_json(nodes, edges, search_depth, search_height, json_file) output_to_json(nodes, edges, search_depth, search_height, json_file)
return error_doi_list return error_doi_list
\ No newline at end of file
...@@ -15,11 +15,10 @@ construct_graph_unittest.py ...@@ -15,11 +15,10 @@ construct_graph_unittest.py
update_graph_unittest.py update_graph_unittest.py
- Führt diverse Tests zum Updaten eines alten Graphs mit aktualisierter Input Liste mit eigenen Beispielen und unserer Input_test Funktion aus. - Führt diverse Tests zum Updaten eines alten Graphen mit aktualisierter Input Liste mit eigenen Beispielen und unserer Input_test Funktion aus.
## Authors ## Authors
- Donna Löding - Donna Löding
- Alina Molkentin - Alina Molkentin
- Xinyi Tang
- Judith Große - Judith Große
- Malte Schokolowski - Malte Schokolowski
\ No newline at end of file
...@@ -4,15 +4,16 @@ import sys ...@@ -4,15 +4,16 @@ import sys
sys.path.append("../") sys.path.append("../")
from verarbeitung.construct_new_graph.initialize_graph import init_graph_construction, initialize_nodes_list_test, complete_inner_edges_test from verarbeitung.construct_new_graph.initialize_graph import init_graph_construction, initialize_nodes_list_test, \
complete_inner_edges_test
from verarbeitung.construct_new_graph.add_citations_rec import get_cit_type_list, create_graph_structure_citations_test from verarbeitung.construct_new_graph.add_citations_rec import get_cit_type_list, create_graph_structure_citations_test
from verarbeitung.construct_new_graph.export_to_json import format_nodes, format_edges from verarbeitung.construct_new_graph.export_to_json import format_nodes, format_edges
from verarbeitung.get_pub_from_input import input_test_func from verarbeitung.get_pub_from_input import input_test_func
class ConstructionTest(unittest.TestCase): class ConstructionTest(unittest.TestCase):
maxDiff = None maxDiff = None
def testCycle(self): def testCycle(self):
nodes, edges, err_list = init_graph_construction(['doiz1'], 1, 1, True, False) nodes, edges, err_list = init_graph_construction(['doiz1'], 1, 1, True, False)
doi_nodes = keep_only_dois(nodes) doi_nodes = keep_only_dois(nodes)
...@@ -24,7 +25,6 @@ class ConstructionTest(unittest.TestCase): ...@@ -24,7 +25,6 @@ class ConstructionTest(unittest.TestCase):
self.assertCountEqual(doi_nodes, ['doiz1', 'doiz2']) self.assertCountEqual(doi_nodes, ['doiz1', 'doiz2'])
self.assertCountEqual(edges, [['doiz2', 'doiz1'], ['doiz1', 'doiz2']]) self.assertCountEqual(edges, [['doiz2', 'doiz1'], ['doiz1', 'doiz2']])
def testEmptyDepthHeight(self): def testEmptyDepthHeight(self):
nodes, edges, err_list = init_graph_construction(['doi1'], 0, 0, True, False) nodes, edges, err_list = init_graph_construction(['doi1'], 0, 0, True, False)
doi_nodes = keep_only_dois(nodes) doi_nodes = keep_only_dois(nodes)
...@@ -41,7 +41,6 @@ class ConstructionTest(unittest.TestCase): ...@@ -41,7 +41,6 @@ class ConstructionTest(unittest.TestCase):
self.assertCountEqual(doi_nodes, ['doi1', 'doi2', 'doi3']) self.assertCountEqual(doi_nodes, ['doi1', 'doi2', 'doi3'])
self.assertCountEqual(edges, [['doi3', 'doi1'], ['doi1', 'doi2']]) self.assertCountEqual(edges, [['doi3', 'doi1'], ['doi1', 'doi2']])
def testInnerEdges(self): def testInnerEdges(self):
nodes, edges, err_list = init_graph_construction(['doi_ie1'], 1, 1, True, False) nodes, edges, err_list = init_graph_construction(['doi_ie1'], 1, 1, True, False)
doi_nodes = keep_only_dois(nodes) doi_nodes = keep_only_dois(nodes)
...@@ -99,23 +98,25 @@ class ConstructionTest(unittest.TestCase): ...@@ -99,23 +98,25 @@ class ConstructionTest(unittest.TestCase):
self.assertCountEqual(edges, [['doi1', 'doi2'], ['doi3', 'doi1']]) self.assertCountEqual(edges, [['doi1', 'doi2'], ['doi3', 'doi1']])
self.assertCountEqual(err_list, ['doi2ic']) self.assertCountEqual(err_list, ['doi2ic'])
## From here the tests for the individual functions ##
## Ab hier die Tests für die einzelnen Funktionen ##
# initialize_graph.py: # initialize_graph.py:
def test_initialize_nodes_list(self): def test_initialize_nodes_list(self):
references_pub_obj_list, citations_pub_obj_list = initialize_nodes_list_test(['doi_lg_1_i','doi_lg_2_i'], 0, 0, True) references_pub_obj_list, citations_pub_obj_list = initialize_nodes_list_test(['doi_lg_1_i', 'doi_lg_2_i'], 0, 0,
True)
doi_references = keep_only_dois(references_pub_obj_list) doi_references = keep_only_dois(references_pub_obj_list)
doi_citations = keep_only_dois(citations_pub_obj_list) doi_citations = keep_only_dois(citations_pub_obj_list)
self.assertCountEqual(doi_references, []) self.assertCountEqual(doi_references, [])
self.assertCountEqual(doi_citations, []) self.assertCountEqual(doi_citations, [])
references_pub_obj_list, citations_pub_obj_list = initialize_nodes_list_test(['doi_lg_1_i','doi_lg_2_i'], 1, 1, True) references_pub_obj_list, citations_pub_obj_list = initialize_nodes_list_test(['doi_lg_1_i', 'doi_lg_2_i'], 1, 1,
True)
doi_references = keep_only_dois(references_pub_obj_list) doi_references = keep_only_dois(references_pub_obj_list)
doi_citations = keep_only_dois(citations_pub_obj_list) doi_citations = keep_only_dois(citations_pub_obj_list)
self.assertCountEqual(doi_references, ['doi_lg_1_d11', 'doi_lg_1_d12', 'doi_lg_2_d11', 'doi_lg_2_d12']) self.assertCountEqual(doi_references, ['doi_lg_1_d11', 'doi_lg_1_d12', 'doi_lg_2_d11', 'doi_lg_2_d12'])
self.assertCountEqual(doi_citations, ['doi_lg_1_h11','doi_lg_1_h12','doi_cg_i','doi_lg_2_h11','doi_lg_2_h12']) self.assertCountEqual(doi_citations,
['doi_lg_1_h11', 'doi_lg_1_h12', 'doi_cg_i', 'doi_lg_2_h11', 'doi_lg_2_h12'])
def test_complete_inner_edges(self): def test_complete_inner_edges(self):
pub_lg_1_i = input_test_func('doi_lg_1_i') pub_lg_1_i = input_test_func('doi_lg_1_i')
...@@ -128,7 +129,8 @@ class ConstructionTest(unittest.TestCase): ...@@ -128,7 +129,8 @@ class ConstructionTest(unittest.TestCase):
edges = [['doi_lg_1_i', 'doi_lg_1_d12'], ['doi_lg_1_h12', 'doi_lg_1_i']] edges = [['doi_lg_1_i', 'doi_lg_1_d12'], ['doi_lg_1_h12', 'doi_lg_1_i']]
processed_nodes, processed_edges = complete_inner_edges_test(nodes, edges) processed_nodes, processed_edges = complete_inner_edges_test(nodes, edges)
self.assertCountEqual(processed_nodes, [pub_lg_1_i, pub_lg_1_h_12, pub_lg_1_d_12]) self.assertCountEqual(processed_nodes, [pub_lg_1_i, pub_lg_1_h_12, pub_lg_1_d_12])
self.assertCountEqual(processed_edges, [['doi_lg_1_i','doi_lg_1_d12'],['doi_lg_1_h12','doi_lg_1_i'],['doi_lg_1_h12','doi_lg_1_d12']]) self.assertCountEqual(processed_edges, [['doi_lg_1_i', 'doi_lg_1_d12'], ['doi_lg_1_h12', 'doi_lg_1_i'],
['doi_lg_1_h12', 'doi_lg_1_d12']])
# add_citations_rec.py: # add_citations_rec.py:
...@@ -164,41 +166,68 @@ class ConstructionTest(unittest.TestCase): ...@@ -164,41 +166,68 @@ class ConstructionTest(unittest.TestCase):
pub_lg_1_d_12.group = -1 pub_lg_1_d_12.group = -1
# checks if citations/references are found and added # checks if citations/references are found and added
return_nodes, return_edges, cit_list = create_graph_structure_citations_test(pub_lg_1_i, 1, 2, "Citation", True, [pub_lg_1_i, pub_lg_1_d_11, pub_lg_1_d_12],[['doi_lg_1_i','doi_lg_1_d11'],['doi_lg_1_i','doi_lg_1_d12']]) return_nodes, return_edges, cit_list = create_graph_structure_citations_test(pub_lg_1_i, 1, 2, "Citation", True,
[pub_lg_1_i, pub_lg_1_d_11,
pub_lg_1_d_12],
[['doi_lg_1_i', 'doi_lg_1_d11'],
['doi_lg_1_i', 'doi_lg_1_d12']])
self.assertCountEqual(return_nodes, [pub_lg_1_i, pub_lg_1_h_11, pub_lg_1_h_12, pub_lg_1_d_11, pub_lg_1_d_12]) self.assertCountEqual(return_nodes, [pub_lg_1_i, pub_lg_1_h_11, pub_lg_1_h_12, pub_lg_1_d_11, pub_lg_1_d_12])
self.assertCountEqual(return_edges, [['doi_lg_1_i','doi_lg_1_d11'],['doi_lg_1_i','doi_lg_1_d12'],['doi_lg_1_h11','doi_lg_1_i'],['doi_lg_1_h12','doi_lg_1_i']]) self.assertCountEqual(return_edges, [['doi_lg_1_i', 'doi_lg_1_d11'], ['doi_lg_1_i', 'doi_lg_1_d12'],
['doi_lg_1_h11', 'doi_lg_1_i'], ['doi_lg_1_h12', 'doi_lg_1_i']])
self.assertCountEqual(cit_list, [pub_lg_1_h_11, pub_lg_1_h_12]) self.assertCountEqual(cit_list, [pub_lg_1_h_11, pub_lg_1_h_12])
return_nodes, return_edges, cit_list = create_graph_structure_citations_test(pub_lg_1_i, 1, 2, "Reference", True, [pub_lg_1_i, pub_lg_1_h_11, pub_lg_1_h_12],[['doi_lg_1_h11','doi_lg_1_i'],['doi_lg_1_h12','doi_lg_1_i']]) return_nodes, return_edges, cit_list = create_graph_structure_citations_test(pub_lg_1_i, 1, 2, "Reference",
True, [pub_lg_1_i, pub_lg_1_h_11,
pub_lg_1_h_12],
[['doi_lg_1_h11', 'doi_lg_1_i'],
['doi_lg_1_h12', 'doi_lg_1_i']])
self.assertCountEqual(return_nodes, [pub_lg_1_i, pub_lg_1_h_11, pub_lg_1_h_12, pub_lg_1_d_11, pub_lg_1_d_12]) self.assertCountEqual(return_nodes, [pub_lg_1_i, pub_lg_1_h_11, pub_lg_1_h_12, pub_lg_1_d_11, pub_lg_1_d_12])
self.assertCountEqual(return_edges, [['doi_lg_1_i','doi_lg_1_d11'],['doi_lg_1_i','doi_lg_1_d12'],['doi_lg_1_h11','doi_lg_1_i'],['doi_lg_1_h12','doi_lg_1_i']]) self.assertCountEqual(return_edges, [['doi_lg_1_i', 'doi_lg_1_d11'], ['doi_lg_1_i', 'doi_lg_1_d12'],
['doi_lg_1_h11', 'doi_lg_1_i'], ['doi_lg_1_h12', 'doi_lg_1_i']])
self.assertCountEqual(cit_list, [pub_lg_1_d_11, pub_lg_1_d_12]) self.assertCountEqual(cit_list, [pub_lg_1_d_11, pub_lg_1_d_12])
# checks if max depth/height is checked before added # checks if max depth/height is checked before added
return_nodes, return_edges, cit_list = create_graph_structure_citations_test(pub_lg_1_i, 1, 1, "Citation", True, [pub_lg_1_i, pub_lg_1_d_11, pub_lg_1_d_12],[['doi_lg_1_i','doi_lg_1_d11'],['doi_lg_1_i','doi_lg_1_d12']]) return_nodes, return_edges, cit_list = create_graph_structure_citations_test(pub_lg_1_i, 1, 1, "Citation", True,
[pub_lg_1_i, pub_lg_1_d_11,
pub_lg_1_d_12],
[['doi_lg_1_i', 'doi_lg_1_d11'],
['doi_lg_1_i', 'doi_lg_1_d12']])
self.assertCountEqual(return_nodes, [pub_lg_1_i, pub_lg_1_d_11, pub_lg_1_d_12]) self.assertCountEqual(return_nodes, [pub_lg_1_i, pub_lg_1_d_11, pub_lg_1_d_12])
self.assertCountEqual(return_edges, [['doi_lg_1_i', 'doi_lg_1_d11'], ['doi_lg_1_i', 'doi_lg_1_d12']]) self.assertCountEqual(return_edges, [['doi_lg_1_i', 'doi_lg_1_d11'], ['doi_lg_1_i', 'doi_lg_1_d12']])
self.assertCountEqual(cit_list, []) self.assertCountEqual(cit_list, [])
return_nodes, return_edges, cit_list = create_graph_structure_citations_test(pub_lg_1_i, 1, 1, "Reference", True, [pub_lg_1_i, pub_lg_1_h_11, pub_lg_1_h_12],[['doi_lg_1_h11','doi_lg_1_i'],['doi_lg_1_h12','doi_lg_1_i']]) return_nodes, return_edges, cit_list = create_graph_structure_citations_test(pub_lg_1_i, 1, 1, "Reference",
True, [pub_lg_1_i, pub_lg_1_h_11,
pub_lg_1_h_12],
[['doi_lg_1_h11', 'doi_lg_1_i'],
['doi_lg_1_h12', 'doi_lg_1_i']])
self.assertCountEqual(return_nodes, [pub_lg_1_i, pub_lg_1_h_11, pub_lg_1_h_12]) self.assertCountEqual(return_nodes, [pub_lg_1_i, pub_lg_1_h_11, pub_lg_1_h_12])
self.assertCountEqual(return_edges, [['doi_lg_1_h11', 'doi_lg_1_i'], ['doi_lg_1_h12', 'doi_lg_1_i']]) self.assertCountEqual(return_edges, [['doi_lg_1_h11', 'doi_lg_1_i'], ['doi_lg_1_h12', 'doi_lg_1_i']])
self.assertCountEqual(cit_list, []) self.assertCountEqual(cit_list, [])
# checks if max depth/height is checked before added but citation/reference from max depth/height found and added # checks if max depth/height is checked before added but citation/reference from max depth/height found and added
return_nodes, return_edges, cit_list = create_graph_structure_citations_test(pub_lg_1_i, 1, 1, "Citation", True, [pub_lg_1_i, pub_lg_1_d_11, pub_lg_1_d_12, pub_lg_1_h_11],[['doi_lg_1_i','doi_lg_1_d11'],['doi_lg_1_i','doi_lg_1_d12']]) return_nodes, return_edges, cit_list = create_graph_structure_citations_test(pub_lg_1_i, 1, 1, "Citation", True,
[pub_lg_1_i, pub_lg_1_d_11,
pub_lg_1_d_12, pub_lg_1_h_11],
[['doi_lg_1_i', 'doi_lg_1_d11'],
['doi_lg_1_i', 'doi_lg_1_d12']])
self.assertCountEqual(return_nodes, [pub_lg_1_i, pub_lg_1_d_11, pub_lg_1_d_12, pub_lg_1_h_11]) self.assertCountEqual(return_nodes, [pub_lg_1_i, pub_lg_1_d_11, pub_lg_1_d_12, pub_lg_1_h_11])
self.assertCountEqual(return_edges, [['doi_lg_1_i','doi_lg_1_d11'],['doi_lg_1_i','doi_lg_1_d12'],['doi_lg_1_h11','doi_lg_1_i']]) self.assertCountEqual(return_edges, [['doi_lg_1_i', 'doi_lg_1_d11'], ['doi_lg_1_i', 'doi_lg_1_d12'],
['doi_lg_1_h11', 'doi_lg_1_i']])
self.assertCountEqual(cit_list, []) self.assertCountEqual(cit_list, [])
return_nodes, return_edges, cit_list = create_graph_structure_citations_test(pub_lg_1_i, 1, 1, "Reference", True, [pub_lg_1_i, pub_lg_1_h_11, pub_lg_1_h_12, pub_lg_1_d_11],[['doi_lg_1_h11','doi_lg_1_i'],['doi_lg_1_h12','doi_lg_1_i']]) return_nodes, return_edges, cit_list = create_graph_structure_citations_test(pub_lg_1_i, 1, 1, "Reference",
True, [pub_lg_1_i, pub_lg_1_h_11,
pub_lg_1_h_12,
pub_lg_1_d_11],
[['doi_lg_1_h11', 'doi_lg_1_i'],
['doi_lg_1_h12', 'doi_lg_1_i']])
self.assertCountEqual(return_nodes, [pub_lg_1_i, pub_lg_1_h_11, pub_lg_1_h_12, pub_lg_1_d_11]) self.assertCountEqual(return_nodes, [pub_lg_1_i, pub_lg_1_h_11, pub_lg_1_h_12, pub_lg_1_d_11])
self.assertCountEqual(return_edges, [['doi_lg_1_h11','doi_lg_1_i'],['doi_lg_1_h12','doi_lg_1_i'],['doi_lg_1_i','doi_lg_1_d11']]) self.assertCountEqual(return_edges, [['doi_lg_1_h11', 'doi_lg_1_i'], ['doi_lg_1_h12', 'doi_lg_1_i'],
['doi_lg_1_i', 'doi_lg_1_d11']])
self.assertCountEqual(cit_list, []) self.assertCountEqual(cit_list, [])
# export_to_json.py:
## export_to_json.py:
def test_format_nodes(self): def test_format_nodes(self):
pub_lg_1_i = input_test_func('doi_lg_1_i') pub_lg_1_i = input_test_func('doi_lg_1_i')
...@@ -209,31 +238,36 @@ class ConstructionTest(unittest.TestCase): ...@@ -209,31 +238,36 @@ class ConstructionTest(unittest.TestCase):
pub_lg_1_d_11.group = -1 pub_lg_1_d_11.group = -1
return_list_of_node_dicts = format_nodes([pub_lg_1_i, pub_lg_1_h_11, pub_lg_1_d_11]) return_list_of_node_dicts = format_nodes([pub_lg_1_i, pub_lg_1_h_11, pub_lg_1_d_11])
check_list_of_node_dicts = [ {"doi": 'doi_lg_1_i', "name": 'title_lg_1_i', "author": ['contributor_lg_1_i'], "year": 'date_lg_1_i', "journal": 'journal_lg_1_i', "abstract": None, "group": 'Input', "depth": 0, "citations": 2}, check_list_of_node_dicts = [
{"doi": 'doi_lg_1_h11', "name": 'title_lg_1_h11', "author": ['contributor_lg_1_h11'], "year": 'date_lg_1_h11', "journal": 'journal_lg_1_h11', "abstract": None, "group": 'Citedby', "depth": 1, "citations": 2}, {"doi": 'doi_lg_1_i', "name": 'title_lg_1_i', "author": ['contributor_lg_1_i'], "year": 'date_lg_1_i',
{"doi": 'doi_lg_1_d11', "name": 'title_lg_1_d11', "author": ['contributor_lg_1_d11'], "year": 'date_lg_1_d11', "journal": 'journal_lg_1_d11', "abstract": None, "group": 'Reference', "depth": -1, "citations": 1}] "journal": 'journal_lg_1_i', "abstract": None, "group": 'Input', "depth": 0, "citations": 2},
{"doi": 'doi_lg_1_h11', "name": 'title_lg_1_h11', "author": ['contributor_lg_1_h11'],
"year": 'date_lg_1_h11', "journal": 'journal_lg_1_h11', "abstract": None, "group": 'Citedby', "depth": 1,
"citations": 2},
{"doi": 'doi_lg_1_d11', "name": 'title_lg_1_d11', "author": ['contributor_lg_1_d11'],
"year": 'date_lg_1_d11', "journal": 'journal_lg_1_d11', "abstract": None, "group": 'Reference',
"depth": -1, "citations": 1}]
self.assertCountEqual(return_list_of_node_dicts, check_list_of_node_dicts) self.assertCountEqual(return_list_of_node_dicts, check_list_of_node_dicts)
def test_format_edges(self): def test_format_edges(self):
return_list_of_edges = format_edges([['doi_lg_1_i','doi_lg_1_d11'],['doi_lg_1_i','doi_lg_1_d12'],['doi_lg_1_h11','doi_lg_1_i'],['doi_lg_1_h12','doi_lg_1_i']]) return_list_of_edges = format_edges(
check_list_of_edges = [{"source": 'doi_lg_1_i', "target": 'doi_lg_1_d11'},{"source": 'doi_lg_1_i', "target": 'doi_lg_1_d12'}, [['doi_lg_1_i', 'doi_lg_1_d11'], ['doi_lg_1_i', 'doi_lg_1_d12'], ['doi_lg_1_h11', 'doi_lg_1_i'],
{"source": 'doi_lg_1_h11', "target": 'doi_lg_1_i'},{"source": 'doi_lg_1_h12', "target": 'doi_lg_1_i'}] ['doi_lg_1_h12', 'doi_lg_1_i']])
check_list_of_edges = [{"source": 'doi_lg_1_i', "target": 'doi_lg_1_d11'},
{"source": 'doi_lg_1_i', "target": 'doi_lg_1_d12'},
{"source": 'doi_lg_1_h11', "target": 'doi_lg_1_i'},
{"source": 'doi_lg_1_h12', "target": 'doi_lg_1_i'}]
self.assertCountEqual(return_list_of_edges, check_list_of_edges) self.assertCountEqual(return_list_of_edges, check_list_of_edges)
def keep_only_dois(nodes): def keep_only_dois(nodes):
''' '''
:param nodes: input list of nodes of type Publication :param nodes: input list of nodes of type Publication
:type nodes: List[Publication] :type nodes: List[Publication]
gets nodes of type pub and return only their doi gets nodes of type pub and return only their DOI
''' '''
doi_list = [] doi_list = []
for node in nodes: for node in nodes:
......
import sys import sys
sys.path.append("../") sys.path.append("../")
from input.publication import Publication, Citation from input.publication import Publication, Citation
...@@ -6,10 +7,10 @@ from input.publication import Publication, Citation ...@@ -6,10 +7,10 @@ from input.publication import Publication, Citation
def input_test_func(pub_doi): def input_test_func(pub_doi):
''' '''
:param pub_doi: pub doi to find publication in list_of_arrays :param pub_doi: pub DOI to find publication in list_of_arrays
:type pub_doi: String :type pub_doi: String
returns the publication class for given doi returns the publication class for given DOI
''' '''
for array in list_of_arrays: for array in list_of_arrays:
...@@ -22,10 +23,10 @@ def input_test_func(pub_doi): ...@@ -22,10 +23,10 @@ def input_test_func(pub_doi):
def cit(list_doi, cit_type): def cit(list_doi, cit_type):
''' '''
:param list_doi list of citation dois to get their Citation Class :param list_doi list of citation DOIs to get their Citation Class
:type list_doi: List[String] :type list_doi: List[String]
returns a list of citations objects for given doi list returns a list of citations objects for given DOI list
''' '''
cits = [] cits = []
...@@ -36,69 +37,112 @@ def cit(list_doi, cit_type): ...@@ -36,69 +37,112 @@ def cit(list_doi, cit_type):
return cits return cits
beispiel1 = ['doi1', 'title1', ['contributor1'], 'journal1', 'date1', ['subject1'], ['doi2'], ['doi3']] beispiel1 = ['doi1', 'title1', ['contributor1'], 'journal1', 'date1', ['subject1'], ['doi2'], ['doi3']]
beispiel2 = ['doi2', 'title2', ['contributor2'], 'journal2', 'date2', ['subject2'], [], ['doi1']] beispiel2 = ['doi2', 'title2', ['contributor2'], 'journal2', 'date2', ['subject2'], [], ['doi1']]
beispiel3 = ['doi3', 'title3', ['contributor3'], 'journal3', 'date3', ['subject3'], ['doi1'], []] beispiel3 = ['doi3', 'title3', ['contributor3'], 'journal3', 'date3', ['subject3'], ['doi1'], []]
zyklus1 = ['doiz1', 'titlez1', ['contributorz1.1', 'contributorz1.2'], 'journalz1', 'datez1', ['subjectz1'], ['doiz2'], ['doiz2']] zyklus1 = ['doiz1', 'titlez1', ['contributorz1.1', 'contributorz1.2'], 'journalz1', 'datez1', ['subjectz1'], ['doiz2'],
zyklus2 = ['doiz2', 'titlez2', ['contributorz2.1', 'contributorz2.2'], 'journalz2', 'datez2', ['subjectz1'], ['doiz1'], ['doiz1']] ['doiz2']]
zyklus2 = ['doiz2', 'titlez2', ['contributorz2.1', 'contributorz2.2'], 'journalz2', 'datez2', ['subjectz1'], ['doiz1'],
['doiz1']]
inner_edge1 = ['doi_ie1', 'title_ie1', ['contributor_ie1.1', 'contributor_ie1.2'], 'journal_ie1', 'date_ie1', ['subject_ie1'], ['doi_ie2'], ['doi_ie3']] inner_edge1 = ['doi_ie1', 'title_ie1', ['contributor_ie1.1', 'contributor_ie1.2'], 'journal_ie1', 'date_ie1',
inner_edge2 = ['doi_ie2', 'title_ie2', ['contributor_ie2.1', 'contributor_ie2.2'], 'journal_ie2', 'date_ie2', ['subject_ie2'], [], ['doi_ie1','doi_ie3']] ['subject_ie1'], ['doi_ie2'], ['doi_ie3']]
inner_edge3 = ['doi_ie3', 'titlez_ie3', ['contributor_ie3.1', 'contributor_ie3.2'], 'journal_ie3', 'date_ie3', ['subject_ie3'], ['doi_ie1','doi_ie2'], []] inner_edge2 = ['doi_ie2', 'title_ie2', ['contributor_ie2.1', 'contributor_ie2.2'], 'journal_ie2', 'date_ie2',
['subject_ie2'], [], ['doi_ie1', 'doi_ie3']]
inner_edge3 = ['doi_ie3', 'titlez_ie3', ['contributor_ie3.1', 'contributor_ie3.2'], 'journal_ie3', 'date_ie3',
['subject_ie3'], ['doi_ie1', 'doi_ie2'], []]
right_height01 = ['doi_h01', 'title_h01', ['contributor_h01'], 'journal_h01', 'date_h01', ['subject_h01'], [], []] right_height01 = ['doi_h01', 'title_h01', ['contributor_h01'], 'journal_h01', 'date_h01', ['subject_h01'], [], []]
right_height02 = ['doi_h02', 'title_h02', ['contributor_h02'], 'journal_h02', 'date_h02', ['subject_h02'], [], ['doi_h1']] right_height02 = ['doi_h02', 'title_h02', ['contributor_h02'], 'journal_h02', 'date_h02', ['subject_h02'], [],
['doi_h1']]
right_height1 = ['doi_h1', 'title_h1', ['contributor_h1'], 'journal_h1', 'date_h1', ['subject_h1'], [], ['doi_h2']] right_height1 = ['doi_h1', 'title_h1', ['contributor_h1'], 'journal_h1', 'date_h1', ['subject_h1'], [], ['doi_h2']]
right_height2 = ['doi_h2', 'title_h2', ['contributor_h2'], 'journal_h2', 'date_h2', ['subject_h2'], [], ['doi_h3']] right_height2 = ['doi_h2', 'title_h2', ['contributor_h2'], 'journal_h2', 'date_h2', ['subject_h2'], [], ['doi_h3']]
right_height3 = ['doi_h3', 'title_h3', ['contributor_h3'], 'journal_h3', 'date_h3', ['subject_h3'], [], []] right_height3 = ['doi_h3', 'title_h3', ['contributor_h3'], 'journal_h3', 'date_h3', ['subject_h3'], [], []]
right_depth01 = ['doi_d01', 'title_d01', ['contributor_d01'], 'journal_d01', 'date_d01', ['subject_d01'], [], []] right_depth01 = ['doi_d01', 'title_d01', ['contributor_d01'], 'journal_d01', 'date_d01', ['subject_d01'], [], []]
right_depth02 = ['doi_d02', 'title_d02', ['contributor_d02'], 'journal_d02', 'date_d02', ['subject_d01'], ['doi_d1'], []] right_depth02 = ['doi_d02', 'title_d02', ['contributor_d02'], 'journal_d02', 'date_d02', ['subject_d01'], ['doi_d1'],
[]]
right_depth1 = ['doi_d1', 'title_d1', ['contributor_d1'], 'journal_d1', 'date_d1', ['subject_d1'], ['doi_d2'], []] right_depth1 = ['doi_d1', 'title_d1', ['contributor_d1'], 'journal_d1', 'date_d1', ['subject_d1'], ['doi_d2'], []]
right_depth2 = ['doi_d2', 'title_d2', ['contributor_d2'], 'journal_d2', 'date_d2', ['subject_d2'], ['doi_d3'], []] right_depth2 = ['doi_d2', 'title_d2', ['contributor_d2'], 'journal_d2', 'date_d2', ['subject_d2'], ['doi_d3'], []]
right_depth3 = ['doi_d3', 'title_d3', ['contributor_d3'], 'journal_d3', 'date_d3', ['subject_d3'], [], []] right_depth3 = ['doi_d3', 'title_d3', ['contributor_d3'], 'journal_d3', 'date_d3', ['subject_d3'], [], []]
large_graph_1_h21 = ['doi_lg_1_h21', 'title_lg_1_h21', ['contributor_lg_1_h21'], 'journal_lg_1_h21', 'date_lg_1_h21', ['subject_lg_1_h21'], ['doi_lg_1_h11'], []] large_graph_1_h21 = ['doi_lg_1_h21', 'title_lg_1_h21', ['contributor_lg_1_h21'], 'journal_lg_1_h21', 'date_lg_1_h21',
large_graph_1_h22 = ['doi_lg_1_h22', 'title_lg_1_h22', ['contributor_lg_1_h22'], 'journal_lg_1_h22', 'date_lg_1_h22', ['subject_lg_1_h22'], ['doi_lg_1_h11','doi_lg_1_h12'], []] ['subject_lg_1_h21'], ['doi_lg_1_h11'], []]
large_graph_1_h23 = ['doi_lg_1_h23', 'title_lg_1_h23', ['contributor_lg_1_h23'], 'journal_lg_1_h23', 'date_lg_1_h23', ['subject_lg_1_h23'], ['doi_lg_1_h12','doi_cg_i'], []] large_graph_1_h22 = ['doi_lg_1_h22', 'title_lg_1_h22', ['contributor_lg_1_h22'], 'journal_lg_1_h22', 'date_lg_1_h22',
large_graph_1_h11 = ['doi_lg_1_h11', 'title_lg_1_h11', ['contributor_lg_1_h11'], 'journal_lg_1_h11', 'date_lg_1_h11', ['subject_lg_1_h11'], ['doi_lg_1_i'], ['doi_lg_1_h21','doi_lg_1_h22']] ['subject_lg_1_h22'], ['doi_lg_1_h11', 'doi_lg_1_h12'], []]
large_graph_1_h12 = ['doi_lg_1_h12', 'title_lg_1_h12', ['contributor_lg_1_h12'], 'journal_lg_1_h12', 'date_lg_1_h12', ['subject_lg_1_h12'], ['doi_lg_1_i','doi_lg_1_d12'], ['doi_lg_1_h22','doi_lg_1_h23']] large_graph_1_h23 = ['doi_lg_1_h23', 'title_lg_1_h23', ['contributor_lg_1_h23'], 'journal_lg_1_h23', 'date_lg_1_h23',
large_graph_1_i = ['doi_lg_1_i' , 'title_lg_1_i' , ['contributor_lg_1_i'] , 'journal_lg_1_i' , 'date_lg_1_i' , ['subject_lg_1_i'] , ['doi_lg_1_d11','doi_lg_1_d12'], ['doi_lg_1_h11','doi_lg_1_h12']] ['subject_lg_1_h23'], ['doi_lg_1_h12', 'doi_cg_i'], []]
large_graph_1_d11 = ['doi_lg_1_d11', 'title_lg_1_d11', ['contributor_lg_1_d11'], 'journal_lg_1_d11', 'date_lg_1_d11', ['subject_lg_1_d11'], ['doi_lg_1_d21','doi_lg_1_d22'], ['doi_lg_1_i']] large_graph_1_h11 = ['doi_lg_1_h11', 'title_lg_1_h11', ['contributor_lg_1_h11'], 'journal_lg_1_h11', 'date_lg_1_h11',
large_graph_1_d12 = ['doi_lg_1_d12', 'title_lg_1_d12', ['contributor_lg_1_d12'], 'journal_lg_1_d12', 'date_lg_1_d12', ['subject_lg_1_d12'], ['doi_lg_1_d23'], ['doi_lg_1_h12','doi_lg_1_i']] ['subject_lg_1_h11'], ['doi_lg_1_i'], ['doi_lg_1_h21', 'doi_lg_1_h22']]
large_graph_1_d21 = ['doi_lg_1_d21', 'title_lg_1_d21', ['contributor_lg_1_d21'], 'journal_lg_1_d21', 'date_lg_1_d21', ['subject_lg_1_d21'], ['doi_lg_1_d22'], ['doi_lg_1_d11','doi_lg_1_d22']] large_graph_1_h12 = ['doi_lg_1_h12', 'title_lg_1_h12', ['contributor_lg_1_h12'], 'journal_lg_1_h12', 'date_lg_1_h12',
large_graph_1_d22 = ['doi_lg_1_d22', 'title_lg_1_d22', ['contributor_lg_1_d22'], 'journal_lg_1_d22', 'date_lg_1_d22', ['subject_lg_1_d22'], ['doi_lg_1_d21'], ['doi_lg_1_d11','doi_lg_1_d21']] ['subject_lg_1_h12'], ['doi_lg_1_i', 'doi_lg_1_d12'], ['doi_lg_1_h22', 'doi_lg_1_h23']]
large_graph_1_d23 = ['doi_lg_1_d23', 'title_lg_1_d23', ['contributor_lg_1_d23'], 'journal_lg_1_d23', 'date_lg_1_d23', ['subject_lg_1_d23'], [], ['doi_lg_1_d12','doi_cg_d11']] large_graph_1_i = ['doi_lg_1_i', 'title_lg_1_i', ['contributor_lg_1_i'], 'journal_lg_1_i', 'date_lg_1_i',
['subject_lg_1_i'], ['doi_lg_1_d11', 'doi_lg_1_d12'], ['doi_lg_1_h11', 'doi_lg_1_h12']]
large_graph_2_h21 = ['doi_lg_2_h21', 'title_lg_2_h21', ['contributor_lg_2_h21'], 'journal_lg_2_h21', 'date_lg_2_h21', ['subject_lg_2_h21'], ['doi_lg_2_h11'], []] large_graph_1_d11 = ['doi_lg_1_d11', 'title_lg_1_d11', ['contributor_lg_1_d11'], 'journal_lg_1_d11', 'date_lg_1_d11',
large_graph_2_h22 = ['doi_lg_2_h22', 'title_lg_2_h22', ['contributor_lg_2_h22'], 'journal_lg_2_h22', 'date_lg_2_h22', ['subject_lg_2_h22'], ['doi_lg_2_h11'], []] ['subject_lg_1_d11'], ['doi_lg_1_d21', 'doi_lg_1_d22'], ['doi_lg_1_i']]
large_graph_2_h23 = ['doi_lg_2_h23', 'title_lg_2_h23', ['contributor_lg_2_h23'], 'journal_lg_2_h23', 'date_lg_2_h23', ['subject_lg_2_h23'], ['doi_lg_2_h12','doi_lg_2_h24'], ['doi_lg_2_h24']] large_graph_1_d12 = ['doi_lg_1_d12', 'title_lg_1_d12', ['contributor_lg_1_d12'], 'journal_lg_1_d12', 'date_lg_1_d12',
large_graph_2_h24 = ['doi_lg_2_h24', 'title_lg_2_h24', ['contributor_lg_2_h24'], 'journal_lg_2_h24', 'date_lg_2_h24', ['subject_lg_2_h24'], ['doi_lg_2_h12','doi_lg_2_h23','doi_lg_2_d12'], ['doi_lg_2_h23']] ['subject_lg_1_d12'], ['doi_lg_1_d23'], ['doi_lg_1_h12', 'doi_lg_1_i']]
large_graph_2_h11 = ['doi_lg_2_h11', 'title_lg_2_h11', ['contributor_lg_2_h11'], 'journal_lg_2_h11', 'date_lg_2_h11', ['subject_lg_2_h11'], ['doi_lg_2_i','doi_cg_i'], ['doi_lg_2_h21','doi_lg_2_h22']] large_graph_1_d21 = ['doi_lg_1_d21', 'title_lg_1_d21', ['contributor_lg_1_d21'], 'journal_lg_1_d21', 'date_lg_1_d21',
large_graph_2_h12 = ['doi_lg_2_h12', 'title_lg_2_h12', ['contributor_lg_2_h12'], 'journal_lg_2_h12', 'date_lg_2_h12', ['subject_lg_2_h12'], ['doi_lg_2_i'], ['doi_lg_2_h23','doi_lg_2_h24']] ['subject_lg_1_d21'], ['doi_lg_1_d22'], ['doi_lg_1_d11', 'doi_lg_1_d22']]
large_graph_2_i = ['doi_lg_2_i' , 'title_lg_2_i' , ['contributor_lg_2_i'] , 'journal_lg_2_i' , 'date_lg_2_i' , ['subject_lg_2_i'] , ['doi_lg_2_d11','doi_lg_2_d12'], ['doi_lg_2_h11','doi_lg_2_h12','doi_cg_i','doi_lg_2_h11']] large_graph_1_d22 = ['doi_lg_1_d22', 'title_lg_1_d22', ['contributor_lg_1_d22'], 'journal_lg_1_d22', 'date_lg_1_d22',
large_graph_2_d11 = ['doi_lg_2_d11', 'title_lg_2_d11', ['contributor_lg_2_d11'], 'journal_lg_2_d11', 'date_lg_2_d11', ['subject_lg_2_d11'], ['doi_lg_2_i','doi_lg_2_d21'], ['doi_lg_2_i']] ['subject_lg_1_d22'], ['doi_lg_1_d21'], ['doi_lg_1_d11', 'doi_lg_1_d21']]
large_graph_2_d12 = ['doi_lg_2_d12', 'title_lg_2_d12', ['contributor_lg_2_d12'], 'journal_lg_2_d12', 'date_lg_2_d12', ['subject_lg_2_d12'], ['doi_lg_2_d22','doi_lg_2_d23','doi_lg_2_d24'], ['doi_lg_2_h24','doi_lg_2_i']] large_graph_1_d23 = ['doi_lg_1_d23', 'title_lg_1_d23', ['contributor_lg_1_d23'], 'journal_lg_1_d23', 'date_lg_1_d23',
large_graph_2_d21 = ['doi_lg_2_d21', 'title_lg_2_d21', ['contributor_lg_2_d21'], 'journal_lg_2_d21', 'date_lg_2_d21', ['subject_lg_2_d21'], [], ['doi_lg_2_d11']] ['subject_lg_1_d23'], [], ['doi_lg_1_d12', 'doi_cg_d11']]
large_graph_2_d22 = ['doi_lg_2_d22', 'title_lg_2_d22', ['contributor_lg_2_d22'], 'journal_lg_2_d22', 'date_lg_2_d22', ['subject_lg_2_d22'], [], ['doi_lg_2_d12']]
large_graph_2_d23 = ['doi_lg_2_d23', 'title_lg_2_d23', ['contributor_lg_2_d23'], 'journal_lg_2_d23', 'date_lg_2_d23', ['subject_lg_2_d23'], [], ['doi_lg_2_d12']] large_graph_2_h21 = ['doi_lg_2_h21', 'title_lg_2_h21', ['contributor_lg_2_h21'], 'journal_lg_2_h21', 'date_lg_2_h21',
large_graph_2_d24 = ['doi_lg_2_d24', 'title_lg_2_d24', ['contributor_lg_2_d24'], 'journal_lg_2_d24', 'date_lg_2_d24', ['subject_lg_2_d24'], [], ['doi_lg_2_d12']] ['subject_lg_2_h21'], ['doi_lg_2_h11'], []]
large_graph_2_h22 = ['doi_lg_2_h22', 'title_lg_2_h22', ['contributor_lg_2_h22'], 'journal_lg_2_h22', 'date_lg_2_h22',
crossed_graph_h21 = ['doi_cg_h21', 'title_cg_h21', ['contributor_cg_h21'], 'journal_cg_h21', 'date_cg_h21', ['subject_cg_h21'], ['doi_cg_h11'], []] ['subject_lg_2_h22'], ['doi_lg_2_h11'], []]
crossed_graph_h22 = ['doi_cg_h22', 'title_cg_h22', ['contributor_cg_h22'], 'journal_cg_h22', 'date_cg_h22', ['subject_cg_h22'], ['doi_cg_h11'], []] large_graph_2_h23 = ['doi_lg_2_h23', 'title_lg_2_h23', ['contributor_lg_2_h23'], 'journal_lg_2_h23', 'date_lg_2_h23',
crossed_graph_h11 = ['doi_cg_h11', 'title_cg_h11', ['contributor_cg_h11'], 'journal_cg_h11', 'date_cg_h11', ['subject_cg_h11'], ['doi_cg_i'], ['doi_cg_h21','doi_cg_h22']] ['subject_lg_2_h23'], ['doi_lg_2_h12', 'doi_lg_2_h24'], ['doi_lg_2_h24']]
crossed_graph_i = ['doi_cg_i', 'title_cg_i', ['contributor_cg_i'], 'journal_cg_i', 'date_cg_i', ['subject_cg_i'], ['doi_lg_2_i','doi_cg_d11','doi_cg_d12'], ['doi_lg_1_h23','doi_cg_h11','doi_lg_2_h11']] large_graph_2_h24 = ['doi_lg_2_h24', 'title_lg_2_h24', ['contributor_lg_2_h24'], 'journal_lg_2_h24', 'date_lg_2_h24',
crossed_graph_d11 = ['doi_cg_d11', 'title_cg_d11', ['contributor_cg_d11'], 'journal_cg_d11', 'date_cg_d11', ['subject_cg_d11'], ['doi_lg_1_d23','doi_cg_d21'], ['doi_cg_i']] ['subject_lg_2_h24'], ['doi_lg_2_h12', 'doi_lg_2_h23', 'doi_lg_2_d12'], ['doi_lg_2_h23']]
crossed_graph_d12 = ['doi_cg_d12', 'title_cg_d12', ['contributor_cg_d12'], 'journal_cg_d12', 'date_cg_d12', ['subject_cg_d12'], ['doi_cg_d22'], ['doi_cg_i']] large_graph_2_h11 = ['doi_lg_2_h11', 'title_lg_2_h11', ['contributor_lg_2_h11'], 'journal_lg_2_h11', 'date_lg_2_h11',
crossed_graph_d21 = ['doi_cg_d21', 'title_cg_d21', ['contributor_cg_d21'], 'journal_cg_d21', 'date_cg_d21', ['subject_cg_d21'], [], ['doi_cg_d11']] ['subject_lg_2_h11'], ['doi_lg_2_i', 'doi_cg_i'], ['doi_lg_2_h21', 'doi_lg_2_h22']]
crossed_graph_d22 = ['doi_cg_d22', 'title_cg_d22', ['contributor_cg_d22'], 'journal_cg_d22', 'date_cg_d22', ['subject_cg_d22'], [], ['doi_cg_d12']] large_graph_2_h12 = ['doi_lg_2_h12', 'title_lg_2_h12', ['contributor_lg_2_h12'], 'journal_lg_2_h12', 'date_lg_2_h12',
['subject_lg_2_h12'], ['doi_lg_2_i'], ['doi_lg_2_h23', 'doi_lg_2_h24']]
large_graph_2_i = ['doi_lg_2_i', 'title_lg_2_i', ['contributor_lg_2_i'], 'journal_lg_2_i', 'date_lg_2_i',
['subject_lg_2_i'], ['doi_lg_2_d11', 'doi_lg_2_d12'],
['doi_lg_2_h11', 'doi_lg_2_h12', 'doi_cg_i', 'doi_lg_2_h11']]
large_graph_2_d11 = ['doi_lg_2_d11', 'title_lg_2_d11', ['contributor_lg_2_d11'], 'journal_lg_2_d11', 'date_lg_2_d11',
['subject_lg_2_d11'], ['doi_lg_2_i', 'doi_lg_2_d21'], ['doi_lg_2_i']]
large_graph_2_d12 = ['doi_lg_2_d12', 'title_lg_2_d12', ['contributor_lg_2_d12'], 'journal_lg_2_d12', 'date_lg_2_d12',
['subject_lg_2_d12'], ['doi_lg_2_d22', 'doi_lg_2_d23', 'doi_lg_2_d24'],
['doi_lg_2_h24', 'doi_lg_2_i']]
large_graph_2_d21 = ['doi_lg_2_d21', 'title_lg_2_d21', ['contributor_lg_2_d21'], 'journal_lg_2_d21', 'date_lg_2_d21',
['subject_lg_2_d21'], [], ['doi_lg_2_d11']]
large_graph_2_d22 = ['doi_lg_2_d22', 'title_lg_2_d22', ['contributor_lg_2_d22'], 'journal_lg_2_d22', 'date_lg_2_d22',
['subject_lg_2_d22'], [], ['doi_lg_2_d12']]
large_graph_2_d23 = ['doi_lg_2_d23', 'title_lg_2_d23', ['contributor_lg_2_d23'], 'journal_lg_2_d23', 'date_lg_2_d23',
['subject_lg_2_d23'], [], ['doi_lg_2_d12']]
large_graph_2_d24 = ['doi_lg_2_d24', 'title_lg_2_d24', ['contributor_lg_2_d24'], 'journal_lg_2_d24', 'date_lg_2_d24',
['subject_lg_2_d24'], [], ['doi_lg_2_d12']]
crossed_graph_h21 = ['doi_cg_h21', 'title_cg_h21', ['contributor_cg_h21'], 'journal_cg_h21', 'date_cg_h21',
['subject_cg_h21'], ['doi_cg_h11'], []]
crossed_graph_h22 = ['doi_cg_h22', 'title_cg_h22', ['contributor_cg_h22'], 'journal_cg_h22', 'date_cg_h22',
['subject_cg_h22'], ['doi_cg_h11'], []]
crossed_graph_h11 = ['doi_cg_h11', 'title_cg_h11', ['contributor_cg_h11'], 'journal_cg_h11', 'date_cg_h11',
['subject_cg_h11'], ['doi_cg_i'], ['doi_cg_h21', 'doi_cg_h22']]
crossed_graph_i = ['doi_cg_i', 'title_cg_i', ['contributor_cg_i'], 'journal_cg_i', 'date_cg_i', ['subject_cg_i'],
['doi_lg_2_i', 'doi_cg_d11', 'doi_cg_d12'], ['doi_lg_1_h23', 'doi_cg_h11', 'doi_lg_2_h11']]
crossed_graph_d11 = ['doi_cg_d11', 'title_cg_d11', ['contributor_cg_d11'], 'journal_cg_d11', 'date_cg_d11',
['subject_cg_d11'], ['doi_lg_1_d23', 'doi_cg_d21'], ['doi_cg_i']]
crossed_graph_d12 = ['doi_cg_d12', 'title_cg_d12', ['contributor_cg_d12'], 'journal_cg_d12', 'date_cg_d12',
['subject_cg_d12'], ['doi_cg_d22'], ['doi_cg_i']]
crossed_graph_d21 = ['doi_cg_d21', 'title_cg_d21', ['contributor_cg_d21'], 'journal_cg_d21', 'date_cg_d21',
['subject_cg_d21'], [], ['doi_cg_d11']]
crossed_graph_d22 = ['doi_cg_d22', 'title_cg_d22', ['contributor_cg_d22'], 'journal_cg_d22', 'date_cg_d22',
['subject_cg_d22'], [], ['doi_cg_d12']]
list_of_arrays = [beispiel1, beispiel2, beispiel3, zyklus1, zyklus2, inner_edge1, inner_edge2, inner_edge3, list_of_arrays = [beispiel1, beispiel2, beispiel3, zyklus1, zyklus2, inner_edge1, inner_edge2, inner_edge3,
right_height01, right_height02, right_height1, right_height2, right_height3, right_depth01, right_depth02, right_depth1, right_depth2, right_depth3, right_height01, right_height02, right_height1, right_height2, right_height3, right_depth01,
large_graph_1_h21, large_graph_1_h22, large_graph_1_h23, large_graph_1_h11, large_graph_1_h12, large_graph_1_i, large_graph_1_d11, large_graph_1_d12, right_depth02, right_depth1, right_depth2, right_depth3,
large_graph_1_d21, large_graph_1_d22, large_graph_1_d23, large_graph_2_h21, large_graph_2_h22, large_graph_2_h23, large_graph_2_h24, large_graph_2_h11, large_graph_2_h12, large_graph_1_h21, large_graph_1_h22, large_graph_1_h23, large_graph_1_h11, large_graph_1_h12,
large_graph_2_i, large_graph_2_d11, large_graph_2_d12, large_graph_2_d21, large_graph_2_d22, large_graph_2_d23, large_graph_2_d24, crossed_graph_h21, crossed_graph_h22, crossed_graph_h11, large_graph_1_i, large_graph_1_d11, large_graph_1_d12,
large_graph_1_d21, large_graph_1_d22, large_graph_1_d23, large_graph_2_h21, large_graph_2_h22,
large_graph_2_h23, large_graph_2_h24, large_graph_2_h11, large_graph_2_h12,
large_graph_2_i, large_graph_2_d11, large_graph_2_d12, large_graph_2_d21, large_graph_2_d22,
large_graph_2_d23, large_graph_2_d24, crossed_graph_h21, crossed_graph_h22, crossed_graph_h11,
crossed_graph_i, crossed_graph_d11, crossed_graph_d12, crossed_graph_d21, crossed_graph_d22] crossed_graph_i, crossed_graph_d11, crossed_graph_d12, crossed_graph_d21, crossed_graph_d22]
...@@ -3,7 +3,6 @@ import unittest ...@@ -3,7 +3,6 @@ import unittest
import sys import sys
from pathlib import Path from pathlib import Path
sys.path.append("../") sys.path.append("../")
from verarbeitung.construct_new_graph.initialize_graph import init_graph_construction from verarbeitung.construct_new_graph.initialize_graph import init_graph_construction
...@@ -14,9 +13,11 @@ from verarbeitung.update_graph.update_depth import reduce_max_height_depth_test, ...@@ -14,9 +13,11 @@ from verarbeitung.update_graph.update_depth import reduce_max_height_depth_test,
from verarbeitung.update_graph.update_edges import back_to_valid_edges from verarbeitung.update_graph.update_edges import back_to_valid_edges
from verarbeitung.update_graph.delete_nodes_edges import search_ref_cit_graph_rec_test from verarbeitung.update_graph.delete_nodes_edges import search_ref_cit_graph_rec_test
from verarbeitung.update_graph.compare_old_and_new_node_lists import compare_old_and_new_node_lists from verarbeitung.update_graph.compare_old_and_new_node_lists import compare_old_and_new_node_lists
from verarbeitung.update_graph.connect_new_input import find_furthermost_citations_test, complete_changed_group_nodes_test from verarbeitung.update_graph.connect_new_input import find_furthermost_citations_test, \
complete_changed_group_nodes_test
from verarbeitung.get_pub_from_input import input_test_func from verarbeitung.get_pub_from_input import input_test_func
class UpdatingTest(unittest.TestCase): class UpdatingTest(unittest.TestCase):
maxDiff = None maxDiff = None
...@@ -30,7 +31,8 @@ class UpdatingTest(unittest.TestCase): ...@@ -30,7 +31,8 @@ class UpdatingTest(unittest.TestCase):
nodes_old_single, edges_old_single, err_list = init_graph_construction(['doi_cg_i'], 3, 3, True) nodes_old_single, edges_old_single, err_list = init_graph_construction(['doi_cg_i'], 3, 3, True)
nodes_old_two, edges_old_two, err_list = init_graph_construction(['doi_lg_1_i', 'doi_cg_i'], 3, 3, True) nodes_old_two, edges_old_two, err_list = init_graph_construction(['doi_lg_1_i', 'doi_cg_i'], 3, 3, True)
nodes_old_three, edges_old_three, err_list = init_graph_construction(['doi_lg_1_i','doi_lg_2_i','doi_cg_i'],3,3,True) nodes_old_three, edges_old_three, err_list = init_graph_construction(['doi_lg_1_i', 'doi_lg_2_i', 'doi_cg_i'],
3, 3, True)
def test_new_height(self): def test_new_height(self):
nodes_height_0, edges_height_0, err_list = init_graph_construction(['doi_lg_1_i'], 2, 0, True) nodes_height_0, edges_height_0, err_list = init_graph_construction(['doi_lg_1_i'], 2, 0, True)
...@@ -77,12 +79,7 @@ class UpdatingTest(unittest.TestCase): ...@@ -77,12 +79,7 @@ class UpdatingTest(unittest.TestCase):
self.assertCountEqual(new_nodes, nodes) self.assertCountEqual(new_nodes, nodes)
self.assertCountEqual(new_edges, edges) self.assertCountEqual(new_edges, edges)
## From here the tests for the individual functions ##
## Ab hier die Tests für die einzelnen Funktionen ##
# update_graph.py: # update_graph.py:
...@@ -96,12 +93,11 @@ class UpdatingTest(unittest.TestCase): ...@@ -96,12 +93,11 @@ class UpdatingTest(unittest.TestCase):
old_pubs = [pub_lg_1_i, pub_lg_1_h_11, pub_lg_1_d_11] old_pubs = [pub_lg_1_i, pub_lg_1_h_11, pub_lg_1_d_11]
self.assertCountEqual(get_old_input_dois(old_pubs), ['doi_lg_1_i']) self.assertCountEqual(get_old_input_dois(old_pubs), ['doi_lg_1_i'])
# hard to test because we only have dois as test objects and no urls variant # hard to test because we only have DOIs as test objects and no urls variant
def test_get_new_input_dois(self): def test_get_new_input_dois(self):
new_dois = ['doi_lg_2_i', 'doi_lg_1_i', 'doi_cg_i'] new_dois = ['doi_lg_2_i', 'doi_lg_1_i', 'doi_cg_i']
self.assertCountEqual(get_new_input_dois(new_dois, True), ['doi_lg_2_i', 'doi_lg_1_i', 'doi_cg_i']) self.assertCountEqual(get_new_input_dois(new_dois, True), ['doi_lg_2_i', 'doi_lg_1_i', 'doi_cg_i'])
# update_depth.py: # update_depth.py:
def test_reduce_max_height(self): def test_reduce_max_height(self):
...@@ -116,9 +112,12 @@ class UpdatingTest(unittest.TestCase): ...@@ -116,9 +112,12 @@ class UpdatingTest(unittest.TestCase):
pub_lg_2_d_21 = input_test_func('doi_lg_2_d21') pub_lg_2_d_21 = input_test_func('doi_lg_2_d21')
pub_lg_2_d_21.group = -2 pub_lg_2_d_21.group = -2
pubs = [pub_lg_2_i, pub_lg_2_h_11, pub_lg_2_h_21, pub_lg_2_d_11, pub_lg_2_d_21] pubs = [pub_lg_2_i, pub_lg_2_h_11, pub_lg_2_h_21, pub_lg_2_d_11, pub_lg_2_d_21]
self.assertCountEqual(reduce_max_height_depth_test(pubs, 2, "Height"), [pub_lg_2_i, pub_lg_2_h_11, pub_lg_2_h_21, pub_lg_2_d_11, pub_lg_2_d_21]) self.assertCountEqual(reduce_max_height_depth_test(pubs, 2, "Height"),
self.assertCountEqual(reduce_max_height_depth_test(pubs, 1, "Height"), [pub_lg_2_i, pub_lg_2_h_11, pub_lg_2_d_11, pub_lg_2_d_21]) [pub_lg_2_i, pub_lg_2_h_11, pub_lg_2_h_21, pub_lg_2_d_11, pub_lg_2_d_21])
self.assertCountEqual(reduce_max_height_depth_test(pubs, 0, "Height"), [pub_lg_2_i, pub_lg_2_d_11, pub_lg_2_d_21]) self.assertCountEqual(reduce_max_height_depth_test(pubs, 1, "Height"),
[pub_lg_2_i, pub_lg_2_h_11, pub_lg_2_d_11, pub_lg_2_d_21])
self.assertCountEqual(reduce_max_height_depth_test(pubs, 0, "Height"),
[pub_lg_2_i, pub_lg_2_d_11, pub_lg_2_d_21])
def test_reduce_max_depth(self): def test_reduce_max_depth(self):
pub_lg_2_i = input_test_func('doi_lg_2_i') pub_lg_2_i = input_test_func('doi_lg_2_i')
...@@ -132,9 +131,12 @@ class UpdatingTest(unittest.TestCase): ...@@ -132,9 +131,12 @@ class UpdatingTest(unittest.TestCase):
pub_lg_2_d_21 = input_test_func('doi_lg_2_d21') pub_lg_2_d_21 = input_test_func('doi_lg_2_d21')
pub_lg_2_d_21.group = -2 pub_lg_2_d_21.group = -2
pubs = [pub_lg_2_i, pub_lg_2_h_11, pub_lg_2_h_21, pub_lg_2_d_11, pub_lg_2_d_21] pubs = [pub_lg_2_i, pub_lg_2_h_11, pub_lg_2_h_21, pub_lg_2_d_11, pub_lg_2_d_21]
self.assertCountEqual(reduce_max_height_depth_test(pubs, 2, "Depth"), [pub_lg_2_i, pub_lg_2_h_11, pub_lg_2_h_21, pub_lg_2_d_11, pub_lg_2_d_21]) self.assertCountEqual(reduce_max_height_depth_test(pubs, 2, "Depth"),
self.assertCountEqual(reduce_max_height_depth_test(pubs, 1, "Depth"), [pub_lg_2_i, pub_lg_2_d_11, pub_lg_2_h_11, pub_lg_2_h_21]) [pub_lg_2_i, pub_lg_2_h_11, pub_lg_2_h_21, pub_lg_2_d_11, pub_lg_2_d_21])
self.assertCountEqual(reduce_max_height_depth_test(pubs, 0, "Depth"), [pub_lg_2_i, pub_lg_2_h_11, pub_lg_2_h_21]) self.assertCountEqual(reduce_max_height_depth_test(pubs, 1, "Depth"),
[pub_lg_2_i, pub_lg_2_d_11, pub_lg_2_h_11, pub_lg_2_h_21])
self.assertCountEqual(reduce_max_height_depth_test(pubs, 0, "Depth"),
[pub_lg_2_i, pub_lg_2_h_11, pub_lg_2_h_21])
def test_get_old_max_references(self): def test_get_old_max_references(self):
pub_lg_2_i = input_test_func('doi_lg_2_i') pub_lg_2_i = input_test_func('doi_lg_2_i')
...@@ -187,7 +189,8 @@ class UpdatingTest(unittest.TestCase): ...@@ -187,7 +189,8 @@ class UpdatingTest(unittest.TestCase):
pub_lg_2_d_11 = input_test_func('doi_lg_2_d11') pub_lg_2_d_11 = input_test_func('doi_lg_2_d11')
pub_lg_2_d_11.group = -1 pub_lg_2_d_11.group = -1
pubs = [pub_lg_2_i, pub_lg_2_h_11, pub_lg_2_d_11] pubs = [pub_lg_2_i, pub_lg_2_h_11, pub_lg_2_d_11]
edges = [['doi_lg_2_h11','doi_lg_2_i'],['doi_lg_2_i','doi_lg_2_d11'],['doi_lg_2_h21','doi_lg_2_h11'],['doi_lg_2_i','doi_lg_2_d21']] edges = [['doi_lg_2_h11', 'doi_lg_2_i'], ['doi_lg_2_i', 'doi_lg_2_d11'], ['doi_lg_2_h21', 'doi_lg_2_h11'],
['doi_lg_2_i', 'doi_lg_2_d21']]
back_to_valid_edges(edges, pubs) back_to_valid_edges(edges, pubs)
self.assertCountEqual([['doi_lg_2_h11', 'doi_lg_2_i'], ['doi_lg_2_i', 'doi_lg_2_d11']], edges) self.assertCountEqual([['doi_lg_2_h11', 'doi_lg_2_i'], ['doi_lg_2_i', 'doi_lg_2_d11']], edges)
...@@ -219,7 +222,8 @@ class UpdatingTest(unittest.TestCase): ...@@ -219,7 +222,8 @@ class UpdatingTest(unittest.TestCase):
pub_cg_d12.group = -1 pub_cg_d12.group = -1
pub_cg_d11 = input_test_func('doi_cg_d12') pub_cg_d11 = input_test_func('doi_cg_d12')
pub_cg_d11.group = -1 pub_cg_d11.group = -1
pubs = [pub_lg_2_i, pub_lg_2_h11, pub_lg_2_h12, pub_lg_2_d11, pub_lg_2_d12, pub_lg_2_h21, pub_lg_2_h22, pub_lg_2_d21, pub_cg_i, pub_cg_d11, pub_cg_d12, pub_cg_h11] pubs = [pub_lg_2_i, pub_lg_2_h11, pub_lg_2_h12, pub_lg_2_d11, pub_lg_2_d12, pub_lg_2_h21, pub_lg_2_h22,
pub_lg_2_d21, pub_cg_i, pub_cg_d11, pub_cg_d12, pub_cg_h11]
usable_nodes = search_ref_cit_graph_rec_test(pubs, [pub_cg_i], 2, "Citation") usable_nodes = search_ref_cit_graph_rec_test(pubs, [pub_cg_i], 2, "Citation")
self.assertCountEqual(usable_nodes, [pub_cg_h11, pub_lg_2_h11, pub_lg_2_h21, pub_lg_2_h22]) self.assertCountEqual(usable_nodes, [pub_cg_h11, pub_lg_2_h11, pub_lg_2_h21, pub_lg_2_h22])
...@@ -233,7 +237,6 @@ class UpdatingTest(unittest.TestCase): ...@@ -233,7 +237,6 @@ class UpdatingTest(unittest.TestCase):
self.assertCountEqual(inserted_nodes, ['doi_cg_i']) self.assertCountEqual(inserted_nodes, ['doi_cg_i'])
self.assertCountEqual(deleted_nodes, ['doi_lg_2_i']) self.assertCountEqual(deleted_nodes, ['doi_lg_2_i'])
# connect_new_input.py: # connect_new_input.py:
def test_find_furthermost_citations(self): def test_find_furthermost_citations(self):
...@@ -255,14 +258,18 @@ class UpdatingTest(unittest.TestCase): ...@@ -255,14 +258,18 @@ class UpdatingTest(unittest.TestCase):
pub_lg_2_d21.group = -2 pub_lg_2_d21.group = -2
pub_lg_2_d22 = input_test_func('doi_lg_2_d22') pub_lg_2_d22 = input_test_func('doi_lg_2_d22')
pub_lg_2_d22.group = -2 pub_lg_2_d22.group = -2
pubs = [pub_lg_2_i, pub_lg_2_h11, pub_lg_2_h12, pub_lg_2_d11, pub_lg_2_d12, pub_lg_2_h21, pub_lg_2_h22, pub_lg_2_d21, pub_lg_2_d22] pubs = [pub_lg_2_i, pub_lg_2_h11, pub_lg_2_h12, pub_lg_2_d11, pub_lg_2_d12, pub_lg_2_h21, pub_lg_2_h22,
pub_lg_2_d21, pub_lg_2_d22]
self.assertCountEqual(find_furthermost_citations_test(pubs, [], pub_lg_2_h11, 2, 2, "Citation"),[pub_lg_2_h21, pub_lg_2_h22]) self.assertCountEqual(find_furthermost_citations_test(pubs, [], pub_lg_2_h11, 2, 2, "Citation"),
self.assertCountEqual(find_furthermost_citations_test(pubs, [], pub_lg_2_h11, 2, 1, "Citation"),[pub_lg_2_h21, pub_lg_2_h22]) [pub_lg_2_h21, pub_lg_2_h22])
self.assertCountEqual(find_furthermost_citations_test(pubs, [], pub_lg_2_h11, 2, 1, "Citation"),
self.assertCountEqual(find_furthermost_citations_test(pubs, [], pub_lg_2_d11, 2, 2, "Reference"),[pub_lg_2_d21, pub_lg_2_i]) [pub_lg_2_h21, pub_lg_2_h22])
self.assertCountEqual(find_furthermost_citations_test(pubs, [], pub_lg_2_d11, 2, 1, "Reference"),[pub_lg_2_d21, pub_lg_2_i])
self.assertCountEqual(find_furthermost_citations_test(pubs, [], pub_lg_2_d11, 2, 2, "Reference"),
[pub_lg_2_d21, pub_lg_2_i])
self.assertCountEqual(find_furthermost_citations_test(pubs, [], pub_lg_2_d11, 2, 1, "Reference"),
[pub_lg_2_d21, pub_lg_2_i])
def test_complete_changed_group_nodes(self): def test_complete_changed_group_nodes(self):
pub_cg_i = input_test_func('doi_cg_i') pub_cg_i = input_test_func('doi_cg_i')
...@@ -308,7 +315,6 @@ class UpdatingTest(unittest.TestCase): ...@@ -308,7 +315,6 @@ class UpdatingTest(unittest.TestCase):
pub_lg_2_d24 = input_test_func('doi_lg_2_d24') pub_lg_2_d24 = input_test_func('doi_lg_2_d24')
pub_lg_2_d24.group = -2 pub_lg_2_d24.group = -2
moved_1_pub_cg_i = input_test_func('doi_cg_i') moved_1_pub_cg_i = input_test_func('doi_cg_i')
moved_1_pub_cg_i.group = 1 moved_1_pub_cg_i.group = 1
moved_1_pub_cg_h11 = input_test_func('doi_cg_h11') moved_1_pub_cg_h11 = input_test_func('doi_cg_h11')
...@@ -331,7 +337,6 @@ class UpdatingTest(unittest.TestCase): ...@@ -331,7 +337,6 @@ class UpdatingTest(unittest.TestCase):
moved_1_pub_lg_2_h11 = input_test_func('doi_lg_2_h11') moved_1_pub_lg_2_h11 = input_test_func('doi_lg_2_h11')
moved_1_pub_lg_2_h11.group = 1 moved_1_pub_lg_2_h11.group = 1
moved_2_pub_cg_i = input_test_func('doi_cg_i') moved_2_pub_cg_i = input_test_func('doi_cg_i')
moved_2_pub_cg_i.group = -1 moved_2_pub_cg_i.group = -1
moved_2_pub_cg_d11 = input_test_func('doi_cg_d11') moved_2_pub_cg_d11 = input_test_func('doi_cg_d11')
...@@ -367,15 +372,32 @@ class UpdatingTest(unittest.TestCase): ...@@ -367,15 +372,32 @@ class UpdatingTest(unittest.TestCase):
moved_2_pub_lg_2_d24 = input_test_func('doi_lg_2_d24') moved_2_pub_lg_2_d24 = input_test_func('doi_lg_2_d24')
moved_2_pub_lg_2_d24.group = -3 moved_2_pub_lg_2_d24.group = -3
pubs = [pub_cg_i, pub_cg_h11, pub_cg_h21, pub_cg_h22, pub_cg_d11, pub_cg_d12, pub_cg_d21, pub_cg_d22, pub_lg_1_h23, pub_lg_1_d23, pub_lg_2_h21, pub_lg_2_h22, pub_lg_2_h11, pub_lg_2_i, pub_lg_2_d11, pub_lg_2_d12, pub_lg_2_d21, pub_lg_2_d22, pub_lg_2_d23, pub_lg_2_d24] pubs = [pub_cg_i, pub_cg_h11, pub_cg_h21, pub_cg_h22, pub_cg_d11, pub_cg_d12, pub_cg_d21, pub_cg_d22,
pub_lg_1_h23, pub_lg_1_d23, pub_lg_2_h21, pub_lg_2_h22, pub_lg_2_h11, pub_lg_2_i, pub_lg_2_d11,
pub_lg_2_d12, pub_lg_2_d21, pub_lg_2_d22, pub_lg_2_d23, pub_lg_2_d24]
edges = [] edges = []
nodes, edges, handled_nodes = complete_changed_group_nodes_test(pubs, edges, 'doi_cg_d11', 2, 2, 2, 2) nodes, edges, handled_nodes = complete_changed_group_nodes_test(pubs, edges, 'doi_cg_d11', 2, 2, 2, 2)
self.assertCountEqual(nodes, [moved_1_pub_cg_d11, moved_1_pub_cg_d21, moved_1_pub_lg_1_d23, moved_1_pub_cg_i, moved_1_pub_lg_1_h23, moved_1_pub_cg_h11, moved_1_pub_lg_2_h11]) self.assertCountEqual(nodes, [moved_1_pub_cg_d11, moved_1_pub_cg_d21, moved_1_pub_lg_1_d23, moved_1_pub_cg_i,
self.assertCountEqual(edges, [['doi_cg_d11','doi_lg_1_d23'],['doi_cg_d11','doi_cg_d21'],['doi_cg_i','doi_cg_d11'],['doi_lg_1_h23','doi_cg_i'],['doi_cg_h11','doi_cg_i'],['doi_lg_2_h11','doi_cg_i']]) moved_1_pub_lg_1_h23, moved_1_pub_cg_h11, moved_1_pub_lg_2_h11])
self.assertCountEqual(edges,
[['doi_cg_d11', 'doi_lg_1_d23'], ['doi_cg_d11', 'doi_cg_d21'], ['doi_cg_i', 'doi_cg_d11'],
['doi_lg_1_h23', 'doi_cg_i'], ['doi_cg_h11', 'doi_cg_i'], ['doi_lg_2_h11', 'doi_cg_i']])
nodes, edges, handled_nodes = complete_changed_group_nodes_test(pubs, edges, 'doi_lg_2_h11', 2, 2, 3, 3) nodes, edges, handled_nodes = complete_changed_group_nodes_test(pubs, edges, 'doi_lg_2_h11', 2, 2, 3, 3)
self.assertCountEqual(nodes, [moved_2_pub_cg_i, moved_2_pub_cg_d11, moved_2_pub_lg_1_d23, moved_2_pub_cg_d21, moved_2_pub_cg_d12, moved_2_pub_cg_d22, moved_2_pub_lg_2_h21, moved_2_pub_lg_2_h22, moved_2_pub_lg_2_h11, moved_2_pub_lg_2_i, moved_2_pub_lg_2_d11, moved_2_pub_lg_2_d21, moved_2_pub_lg_2_d12, moved_2_pub_lg_2_d22, moved_2_pub_lg_2_d23, moved_2_pub_lg_2_d24]) self.assertCountEqual(nodes, [moved_2_pub_cg_i, moved_2_pub_cg_d11, moved_2_pub_lg_1_d23, moved_2_pub_cg_d21,
self.assertCountEqual(edges, [['doi_cg_d11','doi_lg_1_d23'],['doi_cg_d11','doi_cg_d21'],['doi_cg_i','doi_cg_d11'],['doi_cg_i','doi_cg_d12'],['doi_cg_d12','doi_cg_d22'],['doi_lg_2_h11','doi_cg_i'],['doi_cg_i','doi_lg_2_i'],['doi_lg_2_h21','doi_lg_2_h11'],['doi_lg_2_h22','doi_lg_2_h11'],['doi_lg_2_h11','doi_lg_2_i'],['doi_lg_2_i','doi_lg_2_d11'],['doi_lg_2_d11','doi_lg_2_i'],['doi_lg_2_d11','doi_lg_2_d21'],['doi_lg_2_i','doi_lg_2_d12'],['doi_lg_2_d12','doi_lg_2_d22'],['doi_lg_2_d12','doi_lg_2_d23'],['doi_lg_2_d12','doi_lg_2_d24']]) moved_2_pub_cg_d12, moved_2_pub_cg_d22, moved_2_pub_lg_2_h21,
moved_2_pub_lg_2_h22, moved_2_pub_lg_2_h11, moved_2_pub_lg_2_i,
moved_2_pub_lg_2_d11, moved_2_pub_lg_2_d21, moved_2_pub_lg_2_d12,
moved_2_pub_lg_2_d22, moved_2_pub_lg_2_d23, moved_2_pub_lg_2_d24])
self.assertCountEqual(edges,
[['doi_cg_d11', 'doi_lg_1_d23'], ['doi_cg_d11', 'doi_cg_d21'], ['doi_cg_i', 'doi_cg_d11'],
['doi_cg_i', 'doi_cg_d12'], ['doi_cg_d12', 'doi_cg_d22'], ['doi_lg_2_h11', 'doi_cg_i'],
['doi_cg_i', 'doi_lg_2_i'], ['doi_lg_2_h21', 'doi_lg_2_h11'],
['doi_lg_2_h22', 'doi_lg_2_h11'], ['doi_lg_2_h11', 'doi_lg_2_i'],
['doi_lg_2_i', 'doi_lg_2_d11'], ['doi_lg_2_d11', 'doi_lg_2_i'],
['doi_lg_2_d11', 'doi_lg_2_d21'], ['doi_lg_2_i', 'doi_lg_2_d12'],
['doi_lg_2_d12', 'doi_lg_2_d22'], ['doi_lg_2_d12', 'doi_lg_2_d23'],
['doi_lg_2_d12', 'doi_lg_2_d24']])
def keep_only_dois(nodes): def keep_only_dois(nodes):
...@@ -383,7 +405,7 @@ def keep_only_dois(nodes): ...@@ -383,7 +405,7 @@ def keep_only_dois(nodes):
:param nodes: input list of nodes of type Publication :param nodes: input list of nodes of type Publication
:type nodes: List[Publication] :type nodes: List[Publication]
gets nodes of type pub and return only their doi gets nodes of type pub and return only their DOI
''' '''
doi_list = [] doi_list = []
for node in nodes: for node in nodes:
......
...@@ -36,6 +36,5 @@ update_depth.py ...@@ -36,6 +36,5 @@ update_depth.py
## Authors ## Authors
- Donna Löding - Donna Löding
- Alina Molkentin - Alina Molkentin
- Xinyi Tang
- Judith Große - Judith Große
- Malte Schokolowski - Malte Schokolowski
\ No newline at end of file
#!/usr/bin/env python3 #!/usr/bin/env python3
from collections import Counter from collections import Counter
def compare_old_and_new_node_lists(old_doi_node_list, new_doi_node_list): def compare_old_and_new_node_lists(old_doi_node_list, new_doi_node_list):
''' '''
:param old_doi_node_list: list of dois from old graph :param old_doi_node_list: list of DOIs from old graph
:type old_doi_node_list: List[String] :type old_doi_node_list: List[String]
:param new_doi_node_list: list of dois from new graph :param new_doi_node_list: list of DOIs from new graph
:type new_doi_node_list: List[String] :type new_doi_node_list: List[String]
function to calculate, which nodes from the old graph are deleted and which are added function to calculate, which nodes from the old graph are deleted and which are added
''' '''
dois_from_old_graph = old_doi_node_list #WICHTIG: Keine doppelten DOIs dois_from_old_graph = old_doi_node_list # important: no duplicate DOIs
dois_from_new_graph = new_doi_node_list dois_from_new_graph = new_doi_node_list
deleted_nodes = [] deleted_nodes = []
common_nodes = [] common_nodes = []
inserted_nodes = [] inserted_nodes = []
all_dois = dois_from_old_graph + dois_from_new_graph all_dois = dois_from_old_graph + dois_from_new_graph
for doi in all_dois: # iterates over the merged list of new and old dois for doi in all_dois: # iterates over the merged list of new and old DOIs
if ((all_dois.count(doi) == 2) & (doi not in common_nodes)): # If the doi occurs twice the node is in the old and the new graph if ((all_dois.count(doi) == 2) & (
common_nodes.append(doi) #appends the doi to common ones, if its not alredy in it doi not in common_nodes)): # If the DOI occurs twice the node is in the old and the new graph
elif ((doi in dois_from_old_graph) & (doi not in dois_from_new_graph)): #If the doi occurs once and it is from old graph it is a deleted node common_nodes.append(doi) # appends the DOI to common ones, if its not already in it
deleted_nodes.append(doi) #appends the doi to deleted ones elif ((doi in dois_from_old_graph) & (
elif ((doi in dois_from_new_graph) & (doi not in dois_from_old_graph)): #if the doi occurs ince and it is from new graph it is a inserted node doi not in dois_from_new_graph)): # If the DOI occurs once and it is from old graph it is a deleted node
inserted_nodes.append(doi) #appends the doi to the inserted ones deleted_nodes.append(doi) # appends the DOI to deleted ones
elif ((doi in dois_from_new_graph) & (
doi not in dois_from_old_graph)): # if the DOI occurs ince and it is from new graph it is a inserted node
inserted_nodes.append(doi) # appends the DOI to the inserted ones
return (common_nodes, inserted_nodes, deleted_nodes) return (common_nodes, inserted_nodes, deleted_nodes)
# Test Prints # Test Prints
# liste_1 = ["doi_1","doi_2","doi_3","doi_4","doi_5"] # liste_1 = ["doi_1","doi_2","doi_3","doi_4","doi_5"]
# liste_2 = ["doi_1","doi_2","doi_3","doi_6","doi_7"] # liste_2 = ["doi_1","doi_2","doi_3","doi_6","doi_7"]
# print("gemeinsame Elemente: ",doi_listen_vergleichen(liste_1,liste_2)[0]) # print("gemeinsame Elemente: ",doi_listen_vergleichen(liste_1,liste_2)[0])
# print("hinzugefügte Elemente: ",doi_listen_vergleichen(liste_1,liste_2)[1]) # print("hinzugefügte Elemente: ",doi_listen_vergleichen(liste_1,liste_2)[1])
# print("gelöschte Elemente: ",doi_listen_vergleichen(liste_1,liste_2)[2]) # print("gelöschte Elemente: ",doi_listen_vergleichen(liste_1,liste_2)[2])
...@@ -4,9 +4,10 @@ Functions to update a graph representing citations between multiple ACS/Nature j ...@@ -4,9 +4,10 @@ Functions to update a graph representing citations between multiple ACS/Nature j
""" """
__authors__ = "Donna Löding, Alina Molkentin, Xinyi Tang, Judith Große, Malte Schokolowski" __authors__ = "Donna Löding, Alina Molkentin, Judith Große, Malte Schokolowski"
__email__ = "cis-project2021@zbh.uni-hamburg.de" __email__ = "cis-project2021@zbh.uni-hamburg.de"
__status__ = "Production" __status__ = "Production"
# __copyright__ = "" # __copyright__ = ""
# __credits__ = ["", "", "", ""] # __credits__ = ["", "", "", ""]
# __license__ = "" # __license__ = ""
...@@ -25,7 +26,6 @@ from verarbeitung.construct_new_graph.initialize_graph import init_graph_constru ...@@ -25,7 +26,6 @@ from verarbeitung.construct_new_graph.initialize_graph import init_graph_constru
from verarbeitung.construct_new_graph.add_citations_rec import add_citations, get_cit_type_list, create_global_lists_cit from verarbeitung.construct_new_graph.add_citations_rec import add_citations, get_cit_type_list, create_global_lists_cit
def find_furthermost_citations_test(test_nodes, test_edges, changed_node, old_search_depth, new_search_depth, cit_type): def find_furthermost_citations_test(test_nodes, test_edges, changed_node, old_search_depth, new_search_depth, cit_type):
global nodes, edges global nodes, edges
nodes = test_nodes nodes = test_nodes
...@@ -33,22 +33,25 @@ def find_furthermost_citations_test(test_nodes, test_edges, changed_node, old_se ...@@ -33,22 +33,25 @@ def find_furthermost_citations_test(test_nodes, test_edges, changed_node, old_se
return (find_furthermost_citations(nodes, edges, changed_node, old_search_depth, new_search_depth, cit_type)) return (find_furthermost_citations(nodes, edges, changed_node, old_search_depth, new_search_depth, cit_type))
def complete_changed_group_nodes_test(test_nodes, test_edges, inserted_test_nodes, old_search_depth, old_search_height, new_search_depth, new_search_height):
def complete_changed_group_nodes_test(test_nodes, test_edges, inserted_test_nodes, old_search_depth, old_search_height,
new_search_depth, new_search_height):
global nodes, edges global nodes, edges
nodes = test_nodes nodes = test_nodes
edges = test_edges edges = test_edges
handled_nodes, new_nodes, new_edges = complete_changed_group_nodes(inserted_test_nodes, old_search_depth, old_search_height, new_search_depth, new_search_height, True) handled_nodes, new_nodes, new_edges = complete_changed_group_nodes(inserted_test_nodes, old_search_depth,
old_search_height, new_search_depth,
new_search_height, True)
return (new_nodes, new_edges, handled_nodes) return (new_nodes, new_edges, handled_nodes)
def find_furthermost_citations(new_nodes, new_edges, node, old_search_depth, new_search_depth, cit_type): def find_furthermost_citations(new_nodes, new_edges, node, old_search_depth, new_search_depth, cit_type):
''' '''
:param new_nodes: list of nodes which are generated seperately from main node list to avoid recursive problems :param new_nodes: list of nodes which are generated separately from main node list to avoid recursive problems
:type new_nodes List[Publication] :type new_nodes List[Publication]
:param new_edges: list of edges which are generated seperately from main edge list to avoid recursive problems :param new_edges: list of edges which are generated separately from main edge list to avoid recursive problems
:type new_edges: List[List[String,String]] :type new_edges: List[List[String,String]]
:param node: node which is known but not from input group :param node: node which is known but not from input group
...@@ -78,7 +81,7 @@ def find_furthermost_citations(new_nodes, new_edges, node, old_search_depth, new ...@@ -78,7 +81,7 @@ def find_furthermost_citations(new_nodes, new_edges, node, old_search_depth, new
if cit_type == "Citation": if cit_type == "Citation":
# to find a cyclus and not change height # to find a cycle and not change height
not_in_citations = True not_in_citations = True
for new_cit_node_citation in new_cit_node.citations: for new_cit_node_citation in new_cit_node.citations:
if (cit_node.doi_url == new_cit_node_citation.doi_url): if (cit_node.doi_url == new_cit_node_citation.doi_url):
...@@ -94,7 +97,7 @@ def find_furthermost_citations(new_nodes, new_edges, node, old_search_depth, new ...@@ -94,7 +97,7 @@ def find_furthermost_citations(new_nodes, new_edges, node, old_search_depth, new
elif cit_type == "Reference": elif cit_type == "Reference":
# to find a cyclus and not change depth # to find a cycle and not change depth
not_in_citations = True not_in_citations = True
for new_cit_node_reference in new_cit_node.references: for new_cit_node_reference in new_cit_node.references:
if (new_cit_node.doi_url == new_cit_node_reference.doi_url): if (new_cit_node.doi_url == new_cit_node_reference.doi_url):
...@@ -113,11 +116,12 @@ def find_furthermost_citations(new_nodes, new_edges, node, old_search_depth, new ...@@ -113,11 +116,12 @@ def find_furthermost_citations(new_nodes, new_edges, node, old_search_depth, new
if new_citation not in new_nodes: if new_citation not in new_nodes:
new_nodes.append(new_citation) new_nodes.append(new_citation)
# returns the references/citations which needs to be processed to complete contruction # returns the references/citations which needs to be processed to complete construction
return (citations_saved) return (citations_saved)
def complete_changed_group_nodes(inserted_nodes, old_search_depth, old_search_height, new_search_depth, new_search_height, test_var): def complete_changed_group_nodes(inserted_nodes, old_search_depth, old_search_height, new_search_depth,
new_search_height, test_var):
''' '''
:param inserted_nodes: list of nodes which are inserted to new input array :param inserted_nodes: list of nodes which are inserted to new input array
:type inserted_nodes: List[String] :type inserted_nodes: List[String]
...@@ -134,7 +138,7 @@ def complete_changed_group_nodes(inserted_nodes, old_search_depth, old_search_he ...@@ -134,7 +138,7 @@ def complete_changed_group_nodes(inserted_nodes, old_search_depth, old_search_he
:param new_search_height: height to search for citations from new construction call :param new_search_height: height to search for citations from new construction call
:type new_search_height: int :type new_search_height: int
:param test_var: variable to differenciate between test and url call :param test_var: variable to differentiate between test and url call
:type test_var: boolean :type test_var: boolean
completes the references and citations for nodes which were known in non input group completes the references and citations for nodes which were known in non input group
...@@ -154,14 +158,16 @@ def complete_changed_group_nodes(inserted_nodes, old_search_depth, old_search_he ...@@ -154,14 +158,16 @@ def complete_changed_group_nodes(inserted_nodes, old_search_depth, old_search_he
# get pub from input # get pub from input
pub = get_pub(node.doi_url, test_var) pub = get_pub(node.doi_url, test_var)
if (type(pub) != Publication): if (type(pub) != Publication):
error_doi_list.append(node.doi_url) error_doi_list.append(node.doi_url)
continue continue
# find old maximum publications and complete tree to new max depth # find old maximum publications and complete tree to new max depth
pub.group = node.group pub.group = node.group
old_max_references = find_furthermost_citations(new_nodes, new_edges, pub, old_search_depth, new_search_depth, "Reference") old_max_references = find_furthermost_citations(new_nodes, new_edges, pub, old_search_depth,
add_citations(new_nodes, new_edges, old_max_references, min(old_search_depth - abs(node.group), new_search_depth), new_search_depth, "Reference", test_var) new_search_depth, "Reference")
add_citations(new_nodes, new_edges, old_max_references,
min(old_search_depth - abs(node.group), new_search_depth), new_search_depth, "Reference",
test_var)
# add tree for citations # add tree for citations
add_citations(new_nodes, new_edges, [pub], 0, new_search_height, "Citation", test_var) add_citations(new_nodes, new_edges, [pub], 0, new_search_height, "Citation", test_var)
...@@ -176,14 +182,16 @@ def complete_changed_group_nodes(inserted_nodes, old_search_depth, old_search_he ...@@ -176,14 +182,16 @@ def complete_changed_group_nodes(inserted_nodes, old_search_depth, old_search_he
# get pub from input # get pub from input
pub = get_pub(node.doi_url, test_var) pub = get_pub(node.doi_url, test_var)
if (type(pub) != Publication): if (type(pub) != Publication):
error_doi_list.append(node.doi_url) error_doi_list.append(node.doi_url)
continue continue
# find old maximum publications and complete tree to new max depth # find old maximum publications and complete tree to new max depth
pub.group = node.group pub.group = node.group
old_max_citations = find_furthermost_citations(new_nodes, new_edges, pub, old_search_height, new_search_height, "Citation") old_max_citations = find_furthermost_citations(new_nodes, new_edges, pub, old_search_height,
add_citations(new_nodes, new_edges, old_max_citations, min(old_search_height - abs(node.group), new_search_height), new_search_height, "Citation", test_var) new_search_height, "Citation")
add_citations(new_nodes, new_edges, old_max_citations,
min(old_search_height - abs(node.group), new_search_height), new_search_height, "Citation",
test_var)
# add tree for citations # add tree for citations
add_citations(new_nodes, new_edges, [pub], 0, new_search_depth, "Reference", test_var) add_citations(new_nodes, new_edges, [pub], 0, new_search_depth, "Reference", test_var)
...@@ -201,7 +209,8 @@ def complete_changed_group_nodes(inserted_nodes, old_search_depth, old_search_he ...@@ -201,7 +209,8 @@ def complete_changed_group_nodes(inserted_nodes, old_search_depth, old_search_he
return (handled_inserted_nodes, new_nodes, new_edges) return (handled_inserted_nodes, new_nodes, new_edges)
def connect_old_and_new_input(input_nodes_list, input_edges_list, inserted_nodes, old_search_depth, old_search_height, new_search_depth, new_search_height, test_var = False): def connect_old_and_new_input(input_nodes_list, input_edges_list, inserted_nodes, old_search_depth, old_search_height,
new_search_depth, new_search_height, test_var=False):
''' '''
:param input_nodes_list: list of nodes which are processed for new construction call :param input_nodes_list: list of nodes which are processed for new construction call
:type input_nodes_list: List[Publication] :type input_nodes_list: List[Publication]
...@@ -224,7 +233,7 @@ def connect_old_and_new_input(input_nodes_list, input_edges_list, inserted_nodes ...@@ -224,7 +233,7 @@ def connect_old_and_new_input(input_nodes_list, input_edges_list, inserted_nodes
:param new_search_height: height to search for citations from new construction call :param new_search_height: height to search for citations from new construction call
:type new_search_height: int :type new_search_height: int
:param test_var: variable to differenciate between test and url call :param test_var: variable to differentiate between test and url call
:type test_var: boolean :type test_var: boolean
completes the references and citations for nodes which were known in non input group completes the references and citations for nodes which were known in non input group
...@@ -234,15 +243,19 @@ def connect_old_and_new_input(input_nodes_list, input_edges_list, inserted_nodes ...@@ -234,15 +243,19 @@ def connect_old_and_new_input(input_nodes_list, input_edges_list, inserted_nodes
edges = input_edges_list.copy() edges = input_edges_list.copy()
error_doi_list = [] error_doi_list = []
handled_inserted_nodes, new_nodes, new_edges = complete_changed_group_nodes(inserted_nodes, old_search_depth, old_search_height, new_search_depth, new_search_height, test_var) handled_inserted_nodes, new_nodes, new_edges = complete_changed_group_nodes(inserted_nodes, old_search_depth,
old_search_height, new_search_depth,
new_search_height, test_var)
# copy all nodes from inserted_nodes to new node, if node is not in handled_inserted_nodes # copy all nodes from inserted_nodes to new node, if node is not in handled_inserted_nodes
not_handled_inserted_nodes = [node for node in inserted_nodes if node not in handled_inserted_nodes] not_handled_inserted_nodes = [node for node in inserted_nodes if node not in handled_inserted_nodes]
# function call to begin recursive processing up to max depth/height for unhandled nodes # function call to begin recursive processing up to max depth/height for unhandled nodes
if len(not_handled_inserted_nodes) > 0: if len(not_handled_inserted_nodes) > 0:
new_nodes, new_edges, error_doi_list_new = init_graph_construction(not_handled_inserted_nodes, new_search_depth, new_search_height, test_var = test_var, update_var = True, input_nodes = new_nodes, input_edges = new_edges) new_nodes, new_edges, error_doi_list_new = init_graph_construction(not_handled_inserted_nodes, new_search_depth,
new_search_height, test_var=test_var,
update_var=True, input_nodes=new_nodes,
input_edges=new_edges)
for err_node in error_doi_list_new: for err_node in error_doi_list_new:
if err_node not in error_doi_list: if err_node not in error_doi_list:
error_doi_list.append(err_node) error_doi_list.append(err_node)
......
...@@ -4,7 +4,7 @@ Functions to remove publications/links from nodes/edges list, if they can no lon ...@@ -4,7 +4,7 @@ Functions to remove publications/links from nodes/edges list, if they can no lon
""" """
__authors__ = "Donna Löding, Alina Molkentin, Xinyi Tang, Judith Große, Malte Schokolowski" __authors__ = "Donna Löding, Alina Molkentin, Judith Große, Malte Schokolowski"
__email__ = "cis-project2021@zbh.uni-hamburg.de" __email__ = "cis-project2021@zbh.uni-hamburg.de"
__status__ = "Production" __status__ = "Production"
#__copyright__ = "" #__copyright__ = ""
...@@ -34,7 +34,6 @@ def search_ref_cit_graph_rec_test(pubs, new_test_input, old_max_depth, cit_var): ...@@ -34,7 +34,6 @@ def search_ref_cit_graph_rec_test(pubs, new_test_input, old_max_depth, cit_var):
return usable_nodes return usable_nodes
def search_ref_graph_rec(pub, curr_depth, old_max_depth): def search_ref_graph_rec(pub, curr_depth, old_max_depth):
''' '''
:param pub: pub go get appended to usable_nodes :param pub: pub go get appended to usable_nodes
...@@ -87,7 +86,7 @@ def search_cit_graph_rec(pub, curr_height, old_max_height): ...@@ -87,7 +86,7 @@ def search_cit_graph_rec(pub, curr_height, old_max_height):
usable_nodes.append(cit_pub) usable_nodes.append(cit_pub)
usable_doi_nodes.append(cit_pub.doi_url) usable_doi_nodes.append(cit_pub.doi_url)
# to find a cyclus and avoid recursion error # to find a cycle and avoid recursion error
not_in_references = True not_in_references = True
for reference in pub.references: for reference in pub.references:
if (citation.doi_url == reference.doi_url and reference.doi_url not in usable_doi_nodes): if (citation.doi_url == reference.doi_url and reference.doi_url not in usable_doi_nodes):
...@@ -103,7 +102,7 @@ def delete_nodes_and_edges(input_list, common_nodes, old_edges_list, old_depth, ...@@ -103,7 +102,7 @@ def delete_nodes_and_edges(input_list, common_nodes, old_edges_list, old_depth,
:param input_list: list of publications to get reduced :param input_list: list of publications to get reduced
:type input_list: List[Publication] :type input_list: List[Publication]
:param common_nodes: list of input dois which are in old and new input call :param common_nodes: list of input DOIs which are in old and new input call
:type common_nodes: List[String] :type common_nodes: List[String]
:param old_edges_list: list of links between publications from old call :param old_edges_list: list of links between publications from old call
......
...@@ -4,7 +4,7 @@ Functions to read old json files to recreate old graph structure ...@@ -4,7 +4,7 @@ Functions to read old json files to recreate old graph structure
""" """
__authors__ = "Donna Löding, Alina Molkentin, Xinyi Tang, Judith Große, Malte Schokolowski" __authors__ = "Donna Löding, Alina Molkentin, Judith Große, Malte Schokolowski"
__email__ = "cis-project2021@zbh.uni-hamburg.de" __email__ = "cis-project2021@zbh.uni-hamburg.de"
__status__ = "Production" __status__ = "Production"
#__copyright__ = "" #__copyright__ = ""
...@@ -21,10 +21,9 @@ sys.path.append("../") ...@@ -21,10 +21,9 @@ sys.path.append("../")
from input.publication import Publication, Citation from input.publication import Publication, Citation
def create_pubs_from_json(input_dict): def create_pubs_from_json(input_dict):
''' '''
:param input_dict: dictionary read from old graph Json File :param input_dict: dictionary read from old graph json file
:type json_file: dictionary :type json_file: dictionary
creates list of publication retrieved from old json file creates list of publication retrieved from old json file
...@@ -39,9 +38,10 @@ def create_pubs_from_json(input_dict): ...@@ -39,9 +38,10 @@ def create_pubs_from_json(input_dict):
# appends the objects to a list # appends the objects to a list
list_of_nodes_py.append(pub) list_of_nodes_py.append(pub)
def add_ref_and_cit_to_pubs(input_dict): def add_ref_and_cit_to_pubs(input_dict):
''' '''
:param input_dict: dictionary read from old graph Json File :param input_dict: dictionary read from old graph json file
:type json_file: dictionary :type json_file: dictionary
adds references and citations to retrieved publication list adds references and citations to retrieved publication list
...@@ -52,7 +52,7 @@ def add_ref_and_cit_to_pubs(input_dict): ...@@ -52,7 +52,7 @@ def add_ref_and_cit_to_pubs(input_dict):
for source in list_of_nodes_py: for source in list_of_nodes_py:
for target in list_of_nodes_py: for target in list_of_nodes_py:
# when correct dois found, adds then as references/citatons to publication list # when correct dois found, adds then as references/citations to publication list
if ((source.doi_url == edge["source"]) and (target.doi_url == edge["target"])): if ((source.doi_url == edge["source"]) and (target.doi_url == edge["target"])):
new_reference = Citation(target.doi_url, target.title, target.journal, target.contributors, "Reference") new_reference = Citation(target.doi_url, target.title, target.journal, target.contributors, "Reference")
source.references.append(new_reference) source.references.append(new_reference)
...@@ -66,7 +66,7 @@ def add_ref_and_cit_to_pubs(input_dict): ...@@ -66,7 +66,7 @@ def add_ref_and_cit_to_pubs(input_dict):
def input_from_json(json_file): def input_from_json(json_file):
''' '''
:param json_file: Json-Datei for the old graph :param json_file: Json file for the old graph
:type json_file: String :type json_file: String
retrieves information from old json file to be reused for new graph construction retrieves information from old json file to be reused for new graph construction
...@@ -90,5 +90,4 @@ def input_from_json(json_file): ...@@ -90,5 +90,4 @@ def input_from_json(json_file):
old_depth = old_depth_height[0] old_depth = old_depth_height[0]
old_height = old_depth_height[1] old_height = old_depth_height[1]
return(list_of_nodes_py, list_of_edges_py, old_depth, old_height) return(list_of_nodes_py, list_of_edges_py, old_depth, old_height)
\ No newline at end of file
...@@ -4,7 +4,7 @@ Functions to update the citation depth of recursive graph construction ...@@ -4,7 +4,7 @@ Functions to update the citation depth of recursive graph construction
""" """
__authors__ = "Donna Löding, Alina Molkentin, Xinyi Tang, Judith Große, Malte Schokolowski" __authors__ = "Donna Löding, Alina Molkentin, Judith Große, Malte Schokolowski"
__email__ = "cis-project2021@zbh.uni-hamburg.de" __email__ = "cis-project2021@zbh.uni-hamburg.de"
__status__ = "Production" __status__ = "Production"
#__copyright__ = "" #__copyright__ = ""
...@@ -45,6 +45,7 @@ def reduce_max_height_depth_test(pubs, max_dh, dh_var): ...@@ -45,6 +45,7 @@ def reduce_max_height_depth_test(pubs, max_dh, dh_var):
reduce_max_depth(max_dh) reduce_max_depth(max_dh)
return processed_input_list return processed_input_list
def get_old_max_references_citations_test(pubs, old_dh, dh_var): def get_old_max_references_citations_test(pubs, old_dh, dh_var):
''' '''
:param pubs: list of publication to reduce height/depth in :param pubs: list of publication to reduce height/depth in
...@@ -66,6 +67,7 @@ def get_old_max_references_citations_test(pubs, old_dh, dh_var): ...@@ -66,6 +67,7 @@ def get_old_max_references_citations_test(pubs, old_dh, dh_var):
else: else:
return(get_old_max_references(old_dh, True)) return(get_old_max_references(old_dh, True))
def reduce_max_height(max_height): def reduce_max_height(max_height):
''' '''
:param max_height: new maximum height to reduce publications in publication list to :param max_height: new maximum height to reduce publications in publication list to
...@@ -79,6 +81,7 @@ def reduce_max_height(max_height): ...@@ -79,6 +81,7 @@ def reduce_max_height(max_height):
if (pub.group > max_height): if (pub.group > max_height):
processed_input_list.remove(pub) processed_input_list.remove(pub)
def reduce_max_depth(max_depth): def reduce_max_depth(max_depth):
''' '''
:param max_depth: new maximum depth to reduce publications in publication list to :param max_depth: new maximum depth to reduce publications in publication list to
...@@ -110,6 +113,7 @@ def get_old_max_references(old_depth, test_var): ...@@ -110,6 +113,7 @@ def get_old_max_references(old_depth, test_var):
old_max_references.append(pub) old_max_references.append(pub)
return(old_max_references) return(old_max_references)
def get_old_max_citations(old_height, test_var): def get_old_max_citations(old_height, test_var):
''' '''
:param old_height: old maximum height to search for citations :param old_height: old maximum height to search for citations
...@@ -122,11 +126,11 @@ def get_old_max_citations(old_height, test_var): ...@@ -122,11 +126,11 @@ def get_old_max_citations(old_height, test_var):
if (pub.group == old_height): if (pub.group == old_height):
pub = get_pub(pub.doi_url, test_var) pub = get_pub(pub.doi_url, test_var)
if (type(pub) != Publication): if (type(pub) != Publication):
#print(pub)
continue continue
old_max_citations.append(pub) old_max_citations.append(pub)
return(old_max_citations) return(old_max_citations)
def update_depth(obj_input_list, input_edges, new_depth, new_height, old_depth, old_height, test_var): def update_depth(obj_input_list, input_edges, new_depth, new_height, old_depth, old_height, test_var):
''' '''
:param obj_input_list: input list of publications of type Publication from update_graph :param obj_input_list: input list of publications of type Publication from update_graph
...@@ -141,7 +145,7 @@ def update_depth(obj_input_list, input_edges, new_depth, new_height, old_depth, ...@@ -141,7 +145,7 @@ def update_depth(obj_input_list, input_edges, new_depth, new_height, old_depth,
:param new_height: new maximum height to search for citations :param new_height: new maximum height to search for citations
:type new_height: int :type new_height: int
:param test_var: variable to differenciate between test and url call :param test_var: variable to differentiate between test and url call
:type test_var: boolean :type test_var: boolean
function to adjust old publication search depth to update call function to adjust old publication search depth to update call
...@@ -164,12 +168,8 @@ def update_depth(obj_input_list, input_edges, new_depth, new_height, old_depth, ...@@ -164,12 +168,8 @@ def update_depth(obj_input_list, input_edges, new_depth, new_height, old_depth,
old_max_citations = get_old_max_citations(old_height, test_var) old_max_citations = get_old_max_citations(old_height, test_var)
add_citations(processed_input_list, valid_edges, old_max_citations, old_height, new_height, "Citation", test_var) add_citations(processed_input_list, valid_edges, old_max_citations, old_height, new_height, "Citation", test_var)
back_to_valid_edges(valid_edges, processed_input_list) back_to_valid_edges(valid_edges, processed_input_list)
# adds edges between reference group and citation group of known publications
......
...@@ -12,7 +12,6 @@ def back_to_valid_edges(links_from_json, processed_input_list): ...@@ -12,7 +12,6 @@ def back_to_valid_edges(links_from_json, processed_input_list):
''' '''
list_of_valid_edges = links_from_json.copy() list_of_valid_edges = links_from_json.copy()
# iterates over all edges from old graph # iterates over all edges from old graph
for edge in list_of_valid_edges: for edge in list_of_valid_edges:
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment