Define monetaryCost Metric for Native Nodes

Ideas:

Use a reference cost-estimator based on CPU usage * time. We could take as reference open-source tools to make this estimate. AWS provides such estimates for this: https://aws.amazon.com/ec2/pricing/on-demand/

I completely agree with you and your way of thinking. Considering that some nodes may also have GPUs, TPUs etc, we could estimate the cost per node category. Each node would be categoriezed based on its overall perfmorance. For example some categories may be small, large-gpu and the costs, small => category_cost/hour * usage

assigned to @hherodotou and @mafooq

Feadback from Task 4.3 (vpap)
As a back up solution, we may not deal with quantitative information about costs (e.g., exact cost per hour) but with qualitative (e.g., high/medium/low cots).

changed health status to on track

Current Approach:

Check if node is on-cloud or on-premises
If on-cloud, fetch prices (using Crawler or Static look-up table)
If on-premises, get machine info (CPU cores and RAM) and map it to the closest cloud cost with discount factor

For the initial version, a simple solution should be more than enough. We can use a qualitative approach to classify each node into high, medium, or low cost categories based on its available resources (CPU, RAM, GPU, etc.).

The trick is how to associate resources to a price (quantitative or qualitative). Here's a recommended approach:

Collect the list of the cloud instance offerings (<cpu, mem, gpu, ..., price>) from a cloud provider, e.g., from https://aws.amazon.com/ec2/pricing/on-demand/
Perform k-means clustering on the data and identify/store the k centroids
Given the specs of a machine, find the closest centroid and use its price.

I recommend using k=5. This will also gives us a categorization of monetary cost to very low, low, medium, high, very high.

Sure, please proceed with your proposal so we can evaluate the implementation complexity and identify any potential shortcomings.

I have uploaded the code for the above suggested implementation in the "max-line-length" branch. Please check and test it as well on your cluster.

mentioned in merge request !1 (closed)

I had a look at the code, but I need some clarifications. These could also be useful for the docs later on.

As far as I understand, cloud_cost.xlsx is a static file that is not updated regularly—perhaps only when AWS changes the cost model. The other files, scaler.json and cost_centroids.npy, are used by Hypertool to perform calculations. I’m not sure about clustered_instances.xlsx—is it actually used, and does it need to be kept in GitLab? Otherwise, feel free to add it to .gitignore.

Am I correct in thinking that the script is intended to run only after cloud_cost.xlsx is updated by the maintainer of Hypertool, not by the end user? If so, maybe you could create a separate folder dedicated to the calculation of the monetary cost for better separation of concepts (for example monetary_cost_calculation).

Could you provide a short description of when the script is meant to be run, by whom, and what results should be taken into account for the final calculation of node-monetary-cost? I suppose the model could also be extended in the future, for example, to include GPU costs if required.

Thanks

mentioned in commit 072faa16

Yes, you're right—cloud_cost.xlsx is a static file maintained by Hypertool developers and only updated when major pricing changes occur on AWS (e.g., AWS On-Demand Pricing). The scaler.json and cost_centroids.npy files are generated from this data and used at runtime to infer a node’s monetary cost category based on CPU and RAM. The clustered_instances.xlsx file is just for debugging and can be excluded from the repository. The clustering script is only meant to be run by maintainers, not end users. I agree with your suggestion to move this logic into a separate folder for clarity and future extensibility (e.g., GPU pricing).

changed milestone to %Version 1

mentioned in commit fe5164ad

mentioned in commit 66635cee

Hello Mafooq,

First of all, thank you for your prompt response and the changes you made based on my comments — I really appreciate it.

I’ve also done some refactoring on my side, updated dependencies, and tested the implementation both locally (MicroK8s) and on the staging AWS EKS cluster. Additionally, I added the necessary configuration to support installation as an independent Python package (MANIFEST.in and pyproject.toml).

To save time, instead of pulling your latest commit and adapting my refactor, I’ve pushed my changes to a new branch called monetary_cost_b.

I noticed that you changed the format of some files from .xlsx to .csv. Could you please apply that change to monetary_cost_b as well and then I ll merge the branch into master?

The documentation is still pending, but it will be handled in a separate issue.

Thanks again!

Hi Michael,

Thank you for your input and suggestions. Sure, I will update my minor changes in monetary_cost_b branch today and will also add readme.md which will have all the details of implementation.

We use reStructuredText (RST) as the format for documentation. Please don’t add the docs now. A docs folder would be created in the root of the project for all documentation. Sphinx will be used later to present the docs — My colleague is working on it.

closed

Define monetaryCost Metric for Native Nodes

Designs

Child items ...

Activity