Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse cpu a76 core as a55 result in serious performance problem #195

Open
xiaotongnii opened this issue Oct 25, 2023 · 5 comments
Open

Parse cpu a76 core as a55 result in serious performance problem #195

xiaotongnii opened this issue Oct 25, 2023 · 5 comments
Labels

Comments

@xiaotongnii
Copy link

xiaotongnii commented Oct 25, 2023

Platform:android_arm64-android13
SOC: cortex A 4a55 and 4a76
Cpuinfo:core_siblings_list:0-7
HW:design 4a55 and 4a76 as a cluster on SOC and configure dtsi cpu 8core as a cluster.

Problem:Use benchmark tools test CPU performance,find cpuinfo parser cpu a76 core as a55 result in serious performance problem on Arm CPU

The cpuinfo call the main function as the following:
https://github.com/pytorch/cpuinfo/blob/76d5e8f5b563daa65340a60fce0e9aec73a936df/src/arm/linux/init.c#L34C37-L34C37

Use processors[0].package_leader_id update processors[sibling].package_leader_id(siblings value form 0 to 7).
Result in processors[i].package_leader_id value is 0.(i form 0 to 7)

//init.c:34
static bool cluster_siblings_parser(
	uint32_t processor, uint32_t siblings_start, uint32_t siblings_end,
	struct cpuinfo_arm_linux_processor* processors)
{
	for(int i = 0; i < 8; i++)
	   {cpuinfo_log_debug("init.c.39.processors[i].package_leader_id:%d, ",processors[i].package_leader_id);}
	processors[processor].flags |= CPUINFO_LINUX_FLAG_PACKAGE_CLUSTER;
	uint32_t package_leader_id = processors[processor].package_leader_id;
    cpuinfo_log_debug("File: %s, Function: %s, Line: %d, processor:%d, package_leader_id:%d,processors[processor].package_leader_id:%d,siblings_start:%d, siblings_end:%d\n", __FILE__, __func__, __LINE__,processor, package_leader_id,processors[processor].package_leader_id,siblings_start,siblings_end);

	for (uint32_t sibling = siblings_start; sibling < siblings_end; sibling++) {
		if (!bitmask_all(processors[sibling].flags, CPUINFO_LINUX_FLAG_VALID)) {
			cpuinfo_log_info("invalid processor %"PRIu32" reported as a sibling for processor %"PRIu32,
				sibling, processor);
			continue;
		}

		const uint32_t sibling_package_leader_id = processors[sibling].package_leader_id;
		if (sibling_package_leader_id < package_leader_id) {
			package_leader_id = sibling_package_leader_id;
		}
		// use processors[0].package_leader_id update processors[sibling].package_leader_id.
               //  processors[sibling].package_leader_id value is 0.
		processors[sibling].package_leader_id = package_leader_id;
		processors[sibling].flags |= CPUINFO_LINUX_FLAG_PACKAGE_CLUSTER;
		cpuinfo_log_debug("File: %s, Function: %s, Line: %d, sibling:%d, sibling_package_leader_id:%d, package_leader_id:%d\n", __FILE__, __func__, __LINE__,sibling,sibling_package_leader_id,package_leader_id);

	}

	processors[processor].package_leader_id = package_leader_id;

	return true;
}

Due to processors[processor].package_leader_id (processor from 0 to 7) is 0, cluster_leader value is 0 all the time.

if (cluster_leader == i) {

	/* Initialize core vendor, uarch, MIDR, and frequency for every logical processor */
	for (uint32_t i = 0; i < arm_linux_processors_count; i++) {
		if (bitmask_all(arm_linux_processors[i].flags, CPUINFO_LINUX_FLAG_VALID)) {
			const uint32_t cluster_leader = arm_linux_processors[i].package_leader_id;
			if (cluster_leader == i) {
				/* Cluster leader: decode core vendor and uarch */
				cpuinfo_arm_decode_vendor_uarch(
				arm_linux_processors[cluster_leader].midr,
#if CPUINFO_ARCH_ARM
				!!(arm_linux_processors[cluster_leader].features & CPUINFO_ARM_LINUX_FEATURE_VFPV4),
#endif
				&arm_linux_processors[cluster_leader].vendor,
				&arm_linux_processors[cluster_leader].uarch);
			} else {
				/* Cluster non-leader: copy vendor, uarch, MIDR, and frequency from cluster leader */
				arm_linux_processors[i].flags |= arm_linux_processors[cluster_leader].flags &
					(CPUINFO_ARM_LINUX_VALID_MIDR | CPUINFO_LINUX_FLAG_MAX_FREQUENCY);
				arm_linux_processors[i].midr = arm_linux_processors[cluster_leader].midr;
				arm_linux_processors[i].vendor = arm_linux_processors[cluster_leader].vendor;
				arm_linux_processors[i].uarch = arm_linux_processors[cluster_leader].uarch;
				arm_linux_processors[i].max_frequency = arm_linux_processors[cluster_leader].max_frequency;
			}
		}
	}

arm_linux_processors[i].uarch = arm_linux_processors[cluster_leader].uarch;

arm_linux_processors[i].uarch = arm_linux_processors[cluster_leader].uarch;

Use arm_linux_processors[0].uarch update arm_linux_processors[i].uarch(i from o0 to 7),Reulst in

arm_linux_processors[i].uarch is the same as update arm_linux_processors[0].uarch which is a55(0x00300355 in the cpuinfo.h)

cpuinfo_uarch_cortex_a55 = 0x00300355,

Please give me some advices.
BR.
Xiaotong

@xiaotongnii xiaotongnii changed the title Parser cpu a76 core as a55 result in serious performance problem on Arm CPU Parser cpu a76 core as a55 result in serious performance problem Oct 25, 2023
@xiaotongnii xiaotongnii changed the title Parser cpu a76 core as a55 result in serious performance problem Parse cpu a76 core as a55 result in serious performance problem Oct 25, 2023
@xiaotongnii
Copy link
Author

xiaotongnii commented Nov 2, 2023

Nobody reply?OK
Now,Cpuinfo should use logical cpu cluster device node to parse cpu cluster rather than use physical cpu cluster!!!
We nedd to add a new cpuinfo device node to parse logical cpu cluster,Such as package_leader_id and package_leader_id_list or logical_core_siblings_list

uint32_t package_leader_id;

We can not use a physical cpu cluster device node to parse directly ,because cpu arch is changing all the time.
BR.
Xiaotong

@malfet
Copy link
Contributor

malfet commented Nov 14, 2023

I think some of that is being added as part of RISC-V bringup, but I wonder if it could be landed separately...

@xiaotongnii
Copy link
Author

I think some of that is being added as part of RISC-V bringup, but I wonder if it could be landed separately...

Hi,malfet, what is means about " I wonder if it could be landed separately".
For Arm CPUs,I am willing to slove the issue.And RISV-V CPUS also has a similar programe.
How to make the issue become a PR?

@prashanthswami
Copy link
Collaborator

I suspect we're referring to PR #190 here, where we added the concept of core_leader_id and cluster_leader_id to the RISC-V structure definition.

In the RISC-V implementation, we tie the processor's uarch to the core_leader. This seems to be essentially the problem you're flagging on ARM - logical processors on the same physical core are guaranteed to have one uarch, but multiple physical cores tied to the same cluster don't necessarily have to be the same uarch.

I think what malfet@ meant by 'it could be landed separately' is whether #190 might somehow block you, because 2 weeks ago at the comment, it was not yet landed. However, these paths are completely independent, so you could put up a fix for this in ARM without worrying about RISC-V.

In regards to how to make this issue a PR, just create a PR separately and follow this guide to link your PR to this issue.

xiaotongnii added a commit to xiaotongnii/cpuinfo that referenced this issue Dec 9, 2023
@xiaotongnii
Copy link
Author

Application
uint32_t uarch_index = cpuinfo_get_current_uarch_index();-> const struct cpuinfo_uarch_info* cpuinfo_uarch_info = cpuinfo_get_uarch(uarch_index); (cpuinfo_uarchs,init->fill uarchs in it)-> cpuinfo_uarch_info->uarch ;(Type of CPU microarchitecture )

Cpuinfo:
sibling_package_leader_id ->package_leader_id -> processors[sibling].package_leader_id

cpuinfo_arm_linux_init->cpuinfo_linux_detect_core_siblings->cluster_siblings_parser

static bool cluster_siblings_parser(
	uint32_t processor, uint32_t siblings_start, uint32_t siblings_end,
	struct cpuinfo_arm_linux_processor* processors)
{
	processors[processor].flags |= CPUINFO_LINUX_FLAG_PACKAGE_CLUSTER;
	uint32_t package_leader_id = processors[processor].package_leader_id;

	for (uint32_t sibling = siblings_start; sibling < siblings_end; sibling++) {
		if (!bitmask_all(processors[sibling].flags, CPUINFO_LINUX_FLAG_VALID)) {
			continue;
		}
		const uint32_t sibling_package_leader_id = processors[sibling].package_leader_id;
		if (sibling_package_leader_id < package_leader_id) {
			// package_leader_id = sibling_package_leader_id;
		}
		// processors[sibling].package_leader_id = package_leader_id;
		processors[sibling].flags |= CPUINFO_LINUX_FLAG_PACKAGE_CLUSTER;

	}

	processors[processor].package_leader_id = package_leader_id;

	return true;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants