PC Instability related to reset/power button failure

 

I know we got some geeks on here, maybe can offer a perspective im missing.

About a month ago I started having strange issues on a pc built back in q3 2019. I don't game on this, strictly productivity tasks.

3900X
asus x570-e strix board
64gb ram
water cooled
etc etc.

The system would reboot in the middle of tasks, sometimes even when idling. Striping it down to the core components did not solve the problem. Swapping around memory didn't help either.

The board has an 2 digit led readout (qcode) along with 4 status leds. During post it would get stuck on qcode 15 (Pre Memory System Agent Initialization has started.) and the dram led would be lit (amber). When this happened, it would fail to POST. Sometimes swapping memory would help, other times not. I should mentioned prior the system was rock stable, I could go for 30-90 days between reboots.

I use this machine for work, so I replaced key internals - cpu, board, powersuppy. I have two sets of ram sticks, issue present with either set, so I didn't think ram was the issue.

New parts worked for two weeks then the same problem returned last night. Same error codes. Real head scratcher.

The chance of new parts experiencing the same issue was highly unlikely. The only pcie device connected was the video card which exhibited no signs of trouble during operation - no artifacts, locking up, windows bsod, etc.

After more trouble shooting I had an epiphany. Removed the case power and reset button connectors from the motherboard. Board booted right up, no error codes, nothing. The switches are nothing special, NO momentary push button type with 2' leads terminating in a header that plugs into the motherboard.

Last time I experienced this was 5 years ago on a customer's box. That system would just not POST period, not this inconsistent behavior. In my case the machine would work normally for a period then just reboot as though a reset button was pressed. Sometimes it would POST, other times not. I suppose it's possible the reset button had an intermittent short and what I was experiencing was the equivalent of having the reset button kept depressed while the power button was toggled¿?

The case in use is from 2008, so definitely possible the buttons have worn.

What im having difficulty wrapping my head around is why it would work for a length of time - sometimes days at a time then just crap out. It's not like the reset button is depressed often. The whole box sits stationary (weighs ~#60 with everything inside). Floor doesn't vibrate.

At this point with those two switches replaced (crazy how I had spares on hand), going on no issues for nearly 2 days.

On a meter, the reset button definitely failed. Showing 5-6KΩ resistance when open, 0Ω when closed.

Ideas?

intermittents are like that

Nice hardware!

This could be an intermittent hardware problem. It is difficult to test an intermittent. With the new power and reset buttons, see how it runs. If the problem goes away and you feel like it, replace the case.

With an intermittent, there is no assurance that the problem is solved. Watch it over the long haul.

My vote is to

Chalk it up to GREMLINS and continue on with SOP ASAP.

--
I never get lost, but I do explore new territory every now and then.

Agreed. I used to have up

Agreed.

I used to have up times of 30-90 days between reboots - reboot when I remember to. I sleep the machine at night but its on at least 12 hrs a day, every day.

Most recently, it's even had difficulty waking up. Not so since changing those switches out.

It does make sense. The computer reset switch circuit likely has thresholds for what's considered an open and closed switch. Probably based on a certain voltage or current level through that circuit. Doubt there's anything in there for what happens when it's in between.

Replacing the case is not an option as this case is no longer available - antec 1200. Would be nice to get a modern version of this.

https://i.imgur.com/DitG8E2.jpg

It's a big case, even has a 360mm radiator strapped to the back side.

My theory on POST failure is effectively the reset button was being held down while it was trying to POST.

Bad switches can be tricky.

I never encountered a situation like that with any computer customer. However, my uncle was a car mechanic, and another auto repair shop gave him a car they could not fix. Sometimes the car would start, other times it wouldn't. The other shop had replaced the starter, fuel pump and ignition coils with no luck. My uncle checked the car out and could not find anything. At the end of the day he sat in the car, and was just playing with starting and shutting off the engine. He noticed that if the key was jiggled a certain way in the ignition switch, the car would not start. All of the problems were due to a faulty ignition switch. With bad switches sometimes for no reason they make contact and other times for no reason they don't.

Intermittant is very difficult to find.

Had an intermittent E-stop on a piece of equipment. I jumped around the usual couple that were notorious for failures. All e-stops checked good. I finally jumpered out the entire circuit from the board and it ran. Plugged the e-stop circuit back in and it ran. I suspect it was corroded and/or loose board connector.

--
Nuvi 2460LMT.

leaky

zx1100e1 wrote:

On a meter, the reset button definitely failed. Showing 5-6KΩ resistance when open, 0Ω when closed.

I imagine the open value should be megohms at the very least, so something was hideously wrong there.

--
personal GPS user since 1992

^^Exactly. Going on 3 days

^^Exactly. Going on 3 days now without a single reboot, crash or otherwise.

I've worked on lots of computers in the last 2 decades, have come across this issue twice now.

Somehow the switch became a resistor. Doubt the motherboard circuit was designed to handle that.

Read The trace

Whatever operating system you are running is sure to have a comprehensive trace. Looking backwards in time from the boot events you will see numerous events for attempts at recovery. Earlier than that you will see the failure event.

If you haven't done this before you will find that the most difficult part is to recognize which events to ignore.

Concerning the switch: All switches "bounce". If you aren't familiar with that do a search for "switch bounce". One failure mode is to bounce a long time and consequently create lots of interrupts.

@Minke Agreed, however the

@Minke

Agreed, however the OS doesn't register a reset button depress - or maybe it does, but because the system is reset nothing gets written into the logs. I don't recall seeing anything in the OS (windows) in terms of acpi event handlers for reset buttons - for power/sleep yes.

I'm familiar with switch bounce. It's a function of physically operating the switch. Don't see how it's directly applicable here as the switch isn't modulated when the reset happens.

Although, indirectly, perhaps the reset circuit does account for some level of bounce, which is why the resets happened so randomly. The measured resistance of the open switch did seem to fluctuate some.

It would be quite useful to see a scope waveform of what a small voltage passing through the switch looks like. That would confirm the above.

Intermittent shutdowns

With a 2008 case, it could very well have been the switches. That would not surprise me.

You have cleaned the dust out of this case thoroughly and have at least one, preferably 2+ spinning case fans, yes? You've got your cabling so it's not blocking airflow, yes? You didn't mention swapping out the CPU cooler and using a quality thermal paste--possibilities.

How many watts are on each of these two power supplies you've tried? Was the second power supply new? They do wear out. A name-brand 2019 PS should be fine, but older or off-brand models can get dicier.

You might try the free app version of HWMonitor and keep an eye on CPU temperatures and voltages. https://www.cpuid.com/softwares/hwmonitor.html

Since you've replaced the switches, I'm not sure a new case could help, since the switches and ventilation are the two ways a case can contribute to shutdown issues, but if you decide to go that route, I can recommend the Fractal Design Meshify C Mid-Tower Case. It will fit an ATX board, but since it's a bit on the compact design side, make sure your CPU cooler and video card will fit before ordering one; not all CPU coolers and video cards do. This case has excellent ventilation with user-added fans and excellent electronic components, and a great design for self-builds. It is limited in terms of what it can manage in terms of non-SSD internal hard drives and internal optical drives, so if you need those, check the specifications carefully.

--
"141 could draw faster than he, but Irving was looking for 143..."

Both supplies were name

Both supplies were name brand medium - upper quality supplies. Original was a evga g1+ 650W, the current one is a corsair rm650x ps.

New case is not an option. I can't stand all the rgb garbage. The current case as 12 (twelve) 5 1/4" bays in the front. Each set of 3 is occupied by a one of these - https://www.rosewill.com/rosewill-rsv-sata-cage-34-hard-disk.... It's a 4 bay hotswap enclosure with a fan on the back. At present there's 11 physical drives. All drives are 30-34C temp with ambient ~24.5C (76F).

The top 5 1/4 bays contain a usb port extension - 2x usb 2.0 type a, 2x usb 3.0 gen1 type a, and 1x usb 3 gen2 (10gbps) type C port. The next 2 bays contain a water cooling reservoir/pump. a 360x120x25mm radiator is attached to the back (outside the case). Cooling is not a problem smile.

Fan wise, there are the 3 fans in the drive bays, 2 exhaust fans in the rear (of the case), 3 fans (external) pulling air through the radiator, and a fan in the slot section blowing on the sas card. Oh and the case it self has a 200mm fan up top exhausting. 9 fans total?

Well familiar with hwinfo64.. Here's some data for you.
https://i.imgur.com/VG4DAHc.png

I wouldn't mind a new case, but it has to have lots of 5 1/4 bays (10 min). This thing is steel and weighs #30+ without the drives, probably #55-60 lb with all the drives present. It's a boat anchor.

Here's an older case pic before I installed the top usb device. https://i.imgur.com/awj4Z9o.jpg

2days16h uptime so far.

ham?

zx1100e1 wrote:

... It's a boat anchor...

Are you a ham?

That's not nice

dobs108 wrote:
zx1100e1 wrote:

... It's a boat anchor...

Are you a ham?

Put me into the oven and roast me!

smile

--
Never argue with a pig. It makes you look foolish and it anoys the hell out of the pig!

bacon

dobs108 wrote:
zx1100e1 wrote:

... It's a boat anchor...

Are you a ham?

No, I just kept adding more drivers when running out of room.

Back in the day...

zx1100e1 wrote:

Both supplies were name brand medium - upper quality supplies. Original was a evga g1+ 650W, the current one is a corsair rm650x ps.

New case is not an option. I can't stand all the rgb garbage. The current case as 12 (twelve) 5 1/4" bays in the front. Each set of 3 is occupied by a one of these - https://www.rosewill.com/rosewill-rsv-sata-cage-34-hard-disk.... It's a 4 bay hotswap enclosure with a fan on the back. At present there's 11 physical drives. All drives are 30-34C temp with ambient ~24.5C (76F).

The top 5 1/4 bays contain a usb port extension - 2x usb 2.0 type a, 2x usb 3.0 gen1 type a, and 1x usb 3 gen2 (10gbps) type C port. The next 2 bays contain a water cooling reservoir/pump. a 360x120x25mm radiator is attached to the back (outside the case). Cooling is not a problem smile.

Fan wise, there are the 3 fans in the drive bays, 2 exhaust fans in the rear (of the case), 3 fans (external) pulling air through the radiator, and a fan in the slot section blowing on the sas card. Oh and the case it self has a 200mm fan up top exhausting. 9 fans total?

Well familiar with hwinfo64.. Here's some data for you.
https://i.imgur.com/VG4DAHc.png

I wouldn't mind a new case, but it has to have lots of 5 1/4 bays (10 min). This thing is steel and weighs #30+ without the drives, probably #55-60 lb with all the drives present. It's a boat anchor.

Here's an older case pic before I installed the top usb device. https://i.imgur.com/awj4Z9o.jpg

2days16h uptime so far.

Back in the day, a computer with all that hardware was called a mainframe.

Phil

--
"No misfortune is so bad that whining about it won't make it worse."

Room this box is in does

Room this box is in does stay warmer in the winter smile

3 1/2 days uptime LOL..

Crazy Computers

Worse computer i repaired had been shipped to owner's location in other countries (oil field worker) and it was probably handled roughly. I fooled with it for a week and finally removed the round mobo battery for a couple of minutes and put it back and fixed the problem. Is there any problems with lights in your home going off when too many things are turned on. Have seen problems with transformers being under capacity out on the light pole

This is interesting. i've

This is interesting. i've never come across this. Long long ago I would be happy to troubleshoot unobvious issues like this. But I no longer have the patience. In fact I really disliked troubleshooting for friends and family to the point where I just recommend a macbook now.

But I do have to do troubleshooting for my gaming PCs. Can't get away from that.

@stan393 No power issues

@stan393 No power issues here. In fact all electronics are on a ups (computers, tv, networking equipment, etc).

Believe me, I tried that, clearing cmos.

@ceevee

I was running out of patience with this thing too. Leaving problems solved due to random events doesn't align with my way of thinking. I like tidy solutions. I knew there had to be some reason why it kept crapping out. As pointed out (either here or elsewhere), ram or power supply were the obvious culprits, but replacing both didn't fix it.

Speaking of odd,

I have an old radeon 7770HD gpu that's been sitting on a shelf. Sure it's mostly useless, but has value when the cpu has no igpu for provide some level of video (tops out at 4K @ 30 fps, no hdr).

The problem, its unsupported by current boards due to lack of uefi signing within the bios. The hack around this is to retrieve the current bios, patch it, then flash back.

My 'spare' box is a b550/5700g based system used mostly for dev/testing stuff. It's got an older PS inside without pcie (6pin/12v) plugs. In my collection of cables/adapters/dongles I found a 2x molex to 6 pin pcie. Great. In hooking it up, I only connected the single molex. Upon later inspection, each molex powers half the adapter.

Of course the system would not boot with it, but it also caused a weird side effect. I could no longer get the onboard video to work (white led on the system mainboard board indicating vga failure) with the card removed. Reinstalling the card with a fully powered pcie plug yielded working dvi but no hdmi/dp (card has all 3).

Tried resetting cmos, flashing newer (motherboard) bios, pulling ram, and probably some other things. Even tried several other vbios for the video card. Nothing worked.

Eventually fixed it. The solution is as bizarre as this thread. Stay tuned. We'll see if someone chimes in first. I'll share later in the am if not.

power button

I had the same problem for A while , but it stopped doing it and I have know idea why.

A couple comments - UPS are

A couple comments - UPS are not power conditioners unless specifically designed for it and are unlikely on non-commercial setups. Standard UPS don't filter anything and pass thru AC as is unless/until it hits the trigger point.

Second, check you motherboard, GPU and other cards capacitors and inspect for a slight (or pronounced) bulge. They can operate for an indeterminate amount of time in a comprised state but instability usually returns as they continue to age.

Just some ideas.

So in reference to the

So in reference to the borked onboard video issue 2 posts up.

The fix, move discrete gpu to a different slot. Power on the system. It does some pcie slot enumeration (can tell by the diag led's on the motherboard). Eventually it finishes. Shut the computer down, remove gpu. Now onboard video works fine.

{mind blown}

I had the same thing happen

I had the same thing happen and finally tracked it down to my Antivirus (Bitdefender) program. I have no idea why, I am not really positive it was that, and I forgot exactly what I did to solve it.
However, start by stopping running programs one at a time to see if you can track down an errant program. Maybe a reinstall of that program will do it.