Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix IOPub error when showing progress with get_raw_data(). #40

Merged
merged 10 commits into from
Oct 31, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,9 @@
<!--- Go over all the following points and make sure they have all been completed -->
<!--- If you're unsure about any of these, don't hesitate to ask. We're here to help! -->
- [ ] `CHANGELOG.md` has been updated
- [ ] `xdmod_data/__version__.py` has been updated to the next development version
- [ ] The milestone is set correctly on the pull request
- [ ] The appropriate labels have been added to the pull request
- [ ] Running the automated tests (see `docs/developing.md`) produces no errors
- [ ] Updates have been made to the `xdmod-notebooks` repository as necessary, and the notebooks all run successfully
- [ ] The changes in this PR have been ported/backported to other branches as needed
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
# xdmod-data Changelog

## Main development branch
## v1.x.y development branch

- Document Open XDMoD compatibility in changelog ([\#31](https://github.com/ubccr/xdmod-data/pull/31)).
- Fix IOPub error when showing progress with `get_raw_data()` ([\#37](https://github.com/ubccr/xdmod-data/pull/37)).

## v1.0.1 (2024-09-27)

Expand Down
1 change: 0 additions & 1 deletion docs/developing.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,6 @@
1. Go to the [GitHub milestones](https://github.com/ubccr/xdmod-data/milestones) and close the milestone for the version.

## After release

1. Make a new branch of `xdmod-data` and:
1. Make sure the version number is updated in `xdmod_data/__version__.py` to a pre-release of the next version, e.g., `1.0.1-01`.
1. Update `CHANGELOG.md` to add a section at the top called `Main development branch`.
Expand Down
4 changes: 2 additions & 2 deletions tests/regression/data/jobs-dimensions.csv
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ id,label,description
none,None,Summarizes jobs reported to the ACCESS allocations service (excludes non-ACCESS usage of the resource).
allocation,Allocation,A funded project that is allowed to run jobs on resources.
fieldofscience,Field of Science,The field of science indicated on the allocation request pertaining to the running jobs.
gateway,Gateway,A science gateway is a portal set up to aid submiting jobs to resources.
gateway,Gateway,A science gateway is a portal set up to aid submitting jobs to resources.
grant_type,Grant Type,A categorization of the projects/allocations.
jobsize,Job Size,A categorization of jobs into discrete groups based on the number of cores used by each job.
jobwaittime,Job Wait Time,A categorization of jobs into discrete groups based on the total linear time each job waited.
Expand All @@ -19,7 +19,7 @@ resource,Resource,A resource is a remote computer that can run jobs.
resource_type,Resource Type,A categorization of resources into by their general capabilities.
provider,Service Provider,A service provider is an institution that hosts resources.
username,System Username,The specific system username of the users who ran jobs.
person,User,"A person who is on a PIs allocation, hence able run jobs on resources."
person,User,"A person who is on a PIs allocation, hence able to run jobs on resources."
institution,User Institution,Organizations that have users with allocations.
institution_country,User Institution Country,The name of the country of the institution of the person who ran the compute job.
institution_state,User Institution State,The location of the institution of the person who ran the compute job.
Expand Down
6 changes: 3 additions & 3 deletions tests/regression/data/jobs-metrics.csv
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
id,label,description
utilization,ACCESS CPU Utilization (%),"The percentage of the ACCESS obligation of a resource that has been utilized by ACCESS jobs.<br/><i> ACCESS CPU Utilization:</i> The ratio of the total CPU hours consumed by ACCESS jobs over a given time period divided by the total CPU hours that the system is contractually required to provide to ACCESS during that period. It does not include non-ACCESS jobs.<br/>It is worth noting that this value is a rough estimate in certain cases where the resource providers don't provide accurate records of their system specifications, over time."
avg_ace,ACCESS Credit Equivalents Charged: Per Job (SU),"The average amount of ACCESS Credit Equivalents charged per compute job.<br/>

The ACCESS Credit Equivalent is a measure of how much compute time was used on each resource.
Expand All @@ -15,7 +16,6 @@ The ACCESS Credit Equivalent allows comparison between usage of node-allocated,
resources. It also allows a comparison between resources with different compute power per core.
The <a href=""https://allocations.access-ci.org/exchange_calculator"" target=""_blank"" rel=""noopener noreferrer"">ACCESS allocations exchange calculator</a>
lists conversion rates between an ACCESS Credit Equivalent and a service unit on a resource."
utilization,ACCESS Utilization (%),"The percentage of the ACCESS obligation of a resource that has been utilized by ACCESS jobs.<br/><i> ACCESS Utilization:</i> The ratio of the total CPU hours consumed by ACCESS jobs over a given time period divided by the total CPU hours that the system is contractually required to provide to ACCESS during that period. It does not include non-ACCESS jobs.<br/>It is worth noting that this value is a rough estimate in certain cases where the resource providers don't provide accurate records of their system specifications, over time."
rate_of_usage,Allocation Usage Rate (XD SU/Hour),The rate of ACCESS allocation usage in XD SUs per hour.
rate_of_usage_ace,Allocation Usage Rate ACEs (SU/Hour),The rate of ACCESS allocation usage in ACCESS Credit Equivalents per hour.
avg_cpu_hours,CPU Hours: Per Job,"The average CPU hours (number of CPU cores x wall time hours) per ACCESS job.<br/>For each job, the CPU usage is aggregated. For example, if a job used 1000 CPUs for one minute, it would be aggregated as 1000 CPU minutes or 16.67 CPU hours."
Expand Down Expand Up @@ -82,7 +82,7 @@ Current TeraGrid supercomputers have complex multi-core and memory hierarchies.

Note: The actual charge will depend on the specific requirements of the job (e.g., the mapping of the cores across the machine, or the priority you wish to obtain).<br/>

Note 2: The SUs show here have been normalized against the XSEDE Roaming service. Therefore they are comparable across resources."
Note 2: The SUs shown here have been normalized against the XSEDE Roaming service. Therefore they are comparable across resources."
total_su,XD SUs Charged: Total,"The total amount of XD SUs charged by ACCESS jobs.<br/>
<i>XD SU: </i>1 XSEDE SU is defined as one CPU-hour on a Phase-1 DTF cluster.<br/>
<i>SU - Service Units: </i>Computational resources on the XSEDE are allocated and charged in service units (SUs). SUs are defined locally on each system, with conversion factors among systems based on HPL benchmark results.<br/>
Expand All @@ -91,4 +91,4 @@ Current TeraGrid supercomputers have complex multi-core and memory hierarchies.

Note: The actual charge will depend on the specific requirements of the job (e.g., the mapping of the cores across the machine, or the priority you wish to obtain).<br/>

Note 2: The SUs show here have been normalized against the XSEDE Roaming service. Therefore they are comparable across resources."
Note 2: The SUs shown here have been normalized against the XSEDE Roaming service. Therefore they are comparable across resources."
Original file line number Diff line number Diff line change
@@ -1,44 +1,44 @@
,Nodes,Requested Wall Time,Wait Time,Wall Time,CPU User,"Mount point ""home"" data written","Mount point ""scratch"" data written",Total memory used
0,1,172800,11,506,,,,
1000,1,86400,1,66,,,,
2000,1,86400,18,752,,,,
3000,1,86400,8,5434,,,,
4000,1,86400,6,1572,,,,
5000,1,172800,7,2592,,,,
6000,1,14400,7,2800,,,,
7000,1,3600,2894,1357,,,,
8000,1,21600,116,7277,,,,
9000,1,21600,2173,6764,,,,
10000,1,21600,3574,7095,,,,
11000,1,9000,4,3564,88.01518903173182,992.4354304606816,267087841.6178405,811231171.4285715
12000,1,21600,158,5565,,,,
13000,1,21600,59,6965,,,,
14000,1,21600,9,7760,,,,
15000,1,3600,22122,1335,,,,
16000,1,28800,130,9421,12.262731018331898,19749.432156075072,0,787292327.46875
17000,1,28800,6,1990,,,,
18000,1,172800,13,73,,,,
19000,1,172800,7,129,,,,
20000,1,25200,4,25211,82.16279473845965,0,5113844.667942916,240912572.72941175
21000,1,21600,18,6099,,,,
22000,1,21600,27,7131,,,,
23000,1,1800,61,1079,35.02319701051263,5818814744.200479,0,91742777.25
24000,1,3600,5,2306,0.11814596015380158,,,33983854837.760006
25000,1,960,1,59,2.025062333453586,0,0,118141168
26000,1,172800,1,20494,87.54061396105656,548.1123048956289,0,224020798.15942028
27000,4,7200,2,7214,99.2396948622311,441.0514348202534,34392345950.27519,1104895888.4
28000,1,21600,13,55,1.2148444482641405,,,
29000,1,21600,171,40,,,,
30000,1,960,0,42,1.5504320217730077,0,0,112133180
31000,1,1800,11,183,25.94758412119134,,,129784697856
32000,1,21600,372,114,1.5571541609296509,,,92681043968
33000,2,1800,134,139,55.875186246345784,533.6963000565588,754284385.8435647,137041136
34000,1,7200,74,9,0.94096807333301,4681.666820975073,0,145688575
35000,1,172800,22,83953,98.95217460976379,,,91787361316.73543
36000,1,6000,8,152,0.4601673251104144,,,85277047466.66667
37000,1,900,124,137,96.18834348033303,,,35703571797.333336
38000,1,21600,12,56,24.892228849477622,,,
39000,1,21600,12,134,26.487756894710913,,,113801609216
40000,1,21600,12,229,45.74138522053433,,,42761861802.66667
41000,1,21600,20,307,0.9428384414161763,,,27184930360.88889
42000,1,21600,130,386,1.68608777466353,,,49510031732.36363
0,1,172800,2,15048,67.78143277322484,0,0,149919987.98039216
1000,1,1800,7,133,,,,
2000,1,86400,8,1997,,,,
3000,2,60,7,10,,,,
4000,1,172800,448,88,,,,
5000,1,72000,1514,5277,,,,
6000,1,3600,5575,1340,,,,
7000,1,900,155,252,,,,
8000,1,172800,6,12013,,,,
9000,1,3600,111,36,1.8511046269271951,0,12415575.890133914,223612927
10000,1,21600,9,5993,,,,
11000,1,3600,18425,1346,,,,
12000,1,21600,9,7839,,,,
13000,1,3600,22445,1321,,,,
14000,1,3600,211,3,,,,
15000,1,86400,1681,108,,,,
16000,2,172800,2,85924,35.32048284494427,0,0,669471630.8666667
17000,1,28800,48,39,,,,
18000,1,7200,0,611,1.6317304418827532,0,0,242301667
19000,1,21600,12,1662,,,,
20000,1,21600,26,7206,,,,
21000,1,3600,0,1095,20.298036056662443,307784.68622169655,0,377195866
22000,1,172800,3,280,87.38755792994296,,,131463737856
23000,1,960,3,42,1.573028452792688,0,0,153550667
24000,1,960,0,42,1.4618214897575181,0,0,141788220
25000,256,172800,169623,169681,90.53302268073315,716.4750310720816,110578427985.13992,426185127.56684494
26000,1,21600,13,72,49.75276270147344,,,118342819840
27000,1,21600,191,36,,,,
28000,1,21600,17,21,,,,
29000,1,1800,11,211,50.00127478407303,,,124171051827.2
30000,1,21600,328,184,36.69212927476799,,,144036221952
31000,1,18000,13,6,,,,
32000,1,960,1,41,1.5277511080658916,0,0,172152952
33000,1,129600,476,15,,,,
34000,1,172800,1,15897,98.96305566793583,,,105552713859.87883
35000,1,7200,0,7217,,,,
36000,1,21600,1,93,17.130505149503232,,,147102908416
37000,1,21600,11,150,1.5495503254849237,,,155952593578.66666
38000,1,21600,1,256,31.266221286764996,,,60418209792
39000,1,21600,334,10,,,,
40000,1,21600,334,221,6.50711299161715,,,53499913011.2
41000,1,3600,1,115,0.2959079693229115,,,163743516672
42000,4,7200,0,7215,98.47234983404904,441.1070124335988,4020681635.3062363,1288843817.88
1 change: 1 addition & 0 deletions tests/regression/data/realms.csv
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,5 @@ Cloud,Cloud
Gateways,Gateways
Jobs,Jobs
Requests,Requests
ResourceSpecifications,Resource Specifications
SUPREMM,SUPREMM
2 changes: 1 addition & 1 deletion xdmod_data/__version__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
__title__ = 'xdmod-data'
__version__ = '2.0.0-01'
__version__ = '1.0.2.dev1'
27 changes: 13 additions & 14 deletions xdmod_data/_http_requester.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,15 +63,12 @@ def _request_raw_data(self, params):
response = {'fields': line_json}
else:
data.append(line_json)
if params['show_progress']:
progress_msg = (
'Got ' + str(i) + ' row' + ('' if i == 1 else 's')
+ '...'
)
print(progress_msg, end='\r')
# Only print every 10,000 rows to avoid I/O rate errors.
if params['show_progress'] and i % 10000 == 0:
self.__print_progress_msg(i, '\r')
i += 1
if params['show_progress']:
print(progress_msg + 'DONE')
self.__print_progress_msg(i, 'DONE\n')
else:
num_rows = limit
offset = 0
Expand All @@ -83,16 +80,11 @@ def _request_raw_data(self, params):
partial_data = response['data']
data += partial_data
if params['show_progress']:
progress_msg = (
'Got ' + str(len(data)) + ' row'
+ ('' if len(data) == 1 else 's')
+ '...'
)
print(progress_msg, end='\r')
self.__print_progress_msg(len(data), '\r')
num_rows = len(partial_data)
offset += limit
if params['show_progress']:
print(progress_msg + 'DONE')
self.__print_progress_msg(len(data), 'DONE\n')
return (data, response['fields'])

def _request_filter_values(self, realm_id, dimension_id):
Expand Down Expand Up @@ -210,3 +202,10 @@ def __get_raw_data_limit(self):
else:
raise
return self.__raw_data_limit

def __print_progress_msg(self, num_rows, end='\n'):
progress_msg = (
'Got ' + str(num_rows) + ' row' + ('' if num_rows == 1 else 's')
+ '...'
)
print(progress_msg, end=end)