Skip to content
This repository has been archived by the owner on Apr 17, 2022. It is now read-only.

Further scav transport death crash investigation #3684

Closed
wzdev-ci opened this issue Aug 25, 2012 · 22 comments
Closed

Further scav transport death crash investigation #3684

wzdev-ci opened this issue Aug 25, 2012 · 22 comments

Comments

@wzdev-ci
Copy link
Contributor

resolution_fixed type_patch (an actual patch, not a request for one) | by NoQ


I've made some more digging into the crashes i have with my scav AI mod (where i make scavengers produce transporter-type droids), and here's what is showing up; i'm asking for help in investigating that, cause it may be appearing in the base game as well in some rare cases / who knows.

The game sometimes crashes after some of the scavenger transporter dies (probably on the next game tick or something like that), and never crashes on any other unit's death.

When a transporter droid dies, a some pointer to a droid object becomes invalid and makes the game crash in some random place of the code inside gameLoop (either in rendering or in updating game state).

I believe the bad pointer appears in src/droid.cpp in removeBaseDroid(), the part where it kills all droids inside the transporter. The transporter's psGroup somehow contains some scavenger tanks, which is impossible (it's not a supertransport and these tanks are not cyborgs but actually tanks), and their deletion leads to the crash.

I believe in that because when i change the debug statement inside this for(;;) loop to

debug(LOG_DEATH,"psCurr: %d %d %s; psDel: %d %d %s",
    psCurr->id,psCurr->droidType,objInfo(psCurr),
    psDel->id, psDel->droidType, objInfo(psDel)
);

then i have actually managed to obtain the following stderr (with --debug=death):

death   |02:28:27: [droidDamage:224] Droid 160883 (0xb50cb38) is toast
death   |02:28:27: [removeDroidBase:440] psCurr: 160901 0 Tank; psDel: 160883 6 Scavenger VTOL
info    |02:28:27: [destroyObject:272] destroyObject: object not found in list

Something like that shows up every time the crash happens (and there are no other known crash cases with this mod until now). The "Tank" name can only belong to a scavenger tank, and the droidType is 0 ("DROID_WEAPON"). So it couldn't have appeared in this loop at all, but still it is deleted in this loop.

One extra thing i found suspicious is that when i use the cheat mode to switch to the scavenger player and try to see what droids are inside the transporter, i'm unable to do that: the list UI just doesn't show up when i right-click the transporter. But this isn't true for the regular transporters i give to the scav player with Ctrl+O.

Scavenger transporters are produced by the following call in scavfact.js:

buildDroid(fac, name, body, "V-Tol", "", DROID_TRANSPORTER, weapon); 

Issue migrated from trac:3684 at 2022-04-16 10:35:25 -0700

@wzdev-ci
Copy link
Contributor Author

NoQ uploaded file warzone2100.gdmp-K5W1J5 (34.1 KiB)

The crash dump, one of the most interesting ones, that leads straight to the function mentioned above.

@wzdev-ci
Copy link
Contributor Author

NoQ uploaded file warzone2100.log (40.7 KiB)

Another crash with similar properties.

@wzdev-ci
Copy link
Contributor Author

NoQ uploaded file warzone2100.gdmp-N7fKXA (29.1 KiB)

Another crash with similar properties.

@wzdev-ci
Copy link
Contributor Author

vexed commented


While this is a bug, the real problem is, there are no checks being done script side, or source side, and the game assumes everything is OK.

If you look at how a human would normally do this, they would issue a DORDER_EMBARK order, then it does the checks there.

The old script code just checked for capacity, and if on mission or not, and also safety checks to make sure it is a transport, and a non null droid.

The new script code... don't do squat, and just assumes everything is fine.

I would say, for a quick fix, do some script side checking...

@wzdev-ci
Copy link
Contributor Author

NoQ changed _comment1 which not transferred by tractive

@wzdev-ci
Copy link
Contributor Author

NoQ changed _comment0 which not transferred by tractive

@wzdev-ci
Copy link
Contributor Author

NoQ commented


It sounds as if using js newGroup() during the game (rather than during the early initialisation) may lead to group id collisions. Transporter group id to js tank group id, in particular.

In fact i had quite a few problems with group id collisions while working on NullBot, that disappeared only when i made sure groups are only allocated on startup-time; otherwise it sometimes lead to "trying to add a tank to group belonging to the other player" js exception, and i never managed to find a bug on the script side that may lead to such a trouble.

But this particular Scav AI still allocates groups on run-time, and using it together with NullBot sometimes leads to group collision exceptions on both sides (sometimes in NullBot and sometimes in Scav AI, even though NullBot is free from it on its own).

So probably something similar leads to newGroup() returning an id already belonging to a chopper, or vice versa, an id already taken by a chopper is already in use by one of the js groups.

Note: this paradigm also avoids using the Cyp's Orange Fruit argument.

@wzdev-ci
Copy link
Contributor Author

NoQ changed _comment0 which not transferred by tractive

@wzdev-ci
Copy link
Contributor Author

NoQ commented


grpCreate() call without arguments (defaulting to -1) actually does lead to creating duplicate group ids.

@wzdev-ci
Copy link
Contributor Author

NoQ commented


The crash does not happen until there actually exists a group that was not found from js at grpFind() at some point (and thus re-created), but the group not found isn't usually equal to the transporter's group.

@wzdev-ci
Copy link
Contributor Author

NoQ changed _comment0 which not transferred by tractive

@wzdev-ci
Copy link
Contributor Author

NoQ commented


The faulty transporter is being assigned to one of the groups already created by JS at eventStartLevel(). JS doesn't ever free or release this group (in fact it has no function to do that).

@wzdev-ci
Copy link
Contributor Author

NoQ commented


I think i found it. When a last droid is deleted for a certain group, this group is removed - see void DROID_GROUP::remove(DROID *psDroid), at the end.

But JS still remembers its id in its variable, even if it's empty. But the rest of the game doesn't know about it. So probably some group gets empty, then when transporter is produced the game says hey let's assign this group id to it, then JS says hey i've had a group somewhere with this id and i want to add my tank to it again, then we get this tank deleted when the transporter dies, then we crash.

@wzdev-ci
Copy link
Contributor Author

NoQ commented


Is it ok if we just never delete groups? Or maybe add a flag into the group saying "don't delete me".

@wzdev-ci
Copy link
Contributor Author

NoQ changed type from bug to patch (an actual patch, not a request for one)

@wzdev-ci
Copy link
Contributor Author

NoQ changed _comment0 which not transferred by tractive

@wzdev-ci
Copy link
Contributor Author

NoQ commented


I propose the following fix.
Add this line to the middle of js_newGroup() function body in qtscriptfuncs.cpp:

newGrp->refCount++; // make sure the group is not deleted when it runs out of units

It will make sure JS groups are never deleted, any other groups are unaffected.

@wzdev-ci
Copy link
Contributor Author

vexed commented


Technically, no, it isn't ok to never delete groups, on shutdown.

As for the patch.. erm... reference counting is supposed to be used to help track down issues, so this is a kludge of sorts.

I'll let per fix it :)

@wzdev-ci
Copy link
Contributor Author

NoQ commented


on shutdown
Well, by "never delete groups" i meant "never delete groups simply because refCount==0" (that is, never allow removing a droid lead to deleting the group as well".

reference counting is supposed to be used to help track down issues
Hmm. The comment in group.h says
Number of objects in the group. Group is deleted if refCount<=0. Count number of droids+NULL pointers.
I wonder what are those NULL pointers in this context.
I'll let per fix it :)
ok ...

@wzdev-ci
Copy link
Contributor Author

NoQ commented


Here is how groups are created in wzscript (scriptvals.cpp):

// create a group structure for a ST_GROUP variable
bool scrvNewGroup(INTERP_VAL *psVal)
{
	DROID_GROUP		*psGroup;

	psGroup = grpCreate();

	// increment the refcount so the group doesn't get automatically freed when empty
	psGroup->add(NULL);

	psVal->v.oval = psGroup;

	return true;
}

The "psGroup->add(NULL);" call is completely equivalent to incrementing refCount. I don't see a reason not to do the same in js.

@wzdev-ci
Copy link
Contributor Author

Per changed status from new to closed

@wzdev-ci
Copy link
Contributor Author

Per changed resolution from `` to fixed

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant